Python worker failed to connect back
I got the same error. I solved it installing the previous version of Spark (2.3 instead of 2.4). Now it works perfectly, maybe it is an issue of the lastest version of pyspark.
The heart of the problem is the connection between pyspark and python, solved by redefining the environment variable.
I´ve just changed the environment variable's values PYSPARK_DRIVER_PYTHON
from ipython
to jupyter
and PYSPARK_PYTHON
from python3
to python
.
Now I'm using Jupyter Notebook, Python 3.7, Java JDK 11.0.6, Spark 2.4.2
Set Env PYSPARK_PYTHON=python To Fix It.
I had the same issue. I had set all the environment variables
correctly but still wasn't able to resolve it
In my case,
import findspark
findspark.init()
adding this before even creating the sparkSession helped.
I was using Visual Studio Code
on Windows 10
and spark version was 3.2.0
. Python version is 3.9
.
Note: Initially check if the paths for HADOOP_HOME
SPARK_HOME
PYSPARK_PYTHON
have been set correctly
Downgrading Spark back to 2.3.2 from 2.4.0 was not enough for me. I don't know why but in my case I had to create SparkContext from SparkSession like
sc = spark.sparkContext
Then the very same error disappeared.