Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?
I was trying to run spark-submit and I get "Failed to find Spark assembly JAR. You need to build Spark before running this program." When I try to run spark-shell I get the same error. What I have to do in this situation.
Solution 1:
On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.
Solution 2:
Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit
and spark-shell
.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.
Solution 3:
Try running mvn -DskipTests clean package
first to build Spark.
Solution 4:
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.
Solution 5:
In my case, I install spark by pip3 install pyspark
on macOS system, and the error caused by incorrect SPARK_HOME
variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt