PySpark and Jupyter Notebook

There's a lot of crap advice about getting jupyter notebooks to play nicely with pyspark. I guess things have changed a lot over the last couple of years, but here's how I have things.

I use conda for my python envs, but I doubt that matters here.

SPARK_HOME=/workspace/pyspark-games/spark-1.5.0-bin-hadoop2.4 \
IPYTHON_OPTS="notebook" \

With this approach I don't need/want to play with Jupyter profiles and, it may (now) be unsupported with Jupyter. This way allows me to have pyspark running when I want it and not when I don't.

You'll need to set your own SPARK_HOME and check your versions ;-)

It'd like to shout out to for the how-to.

Published & last updated 02 Jul 2016. Filed under Python. Tagged conda how-to ipython jupyter notebook pyspark

« Scrubbing of (poor) data. A Simple Chrome extension »