How do I set the driver's python version in spark?

I'm using spark 1.4.0-rc2 so I can use python 3 with spark. If I add export PYSPARK_PYTHON=python3 to my .bashrc file, I can run spark interactively with python 3. However, if I want to run a standalone program in local mode, I get an error:

Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

How can I specify the version of python for the driver? Setting export PYSPARK_DRIVER_PYTHON=python3 didn't work.


Solution 1:

Setting both PYSPARK_PYTHON=python3 and PYSPARK_DRIVER_PYTHON=python3 works for me.

I did this using export in my .bashrc. In the end, these are the variables I create:

export SPARK_HOME="$HOME/Downloads/spark-1.4.0-bin-hadoop2.4"
export IPYTHON=1
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=ipython3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

I also followed this tutorial to make it work from within Ipython3 notebook: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/

Solution 2:

You need to make sure the standalone project you're launching is launched with Python 3. If you are submitting your standalone program through spark-submit then it should work fine, but if you are launching it with python make sure you use python3 to start your app.

Also, make sure you have set your env variables in ./conf/spark-env.sh (if it doesn't exist you can use spark-env.sh.template as a base.)

Solution 3:

Helped in my case:

import os

os.environ["SPARK_HOME"] = "/usr/local/Cellar/apache-spark/1.5.1/"
os.environ["PYSPARK_PYTHON"]="/usr/local/bin/python3"