How to set up Spark on Windows?
Steps to install Spark in local mode:
-
Install Java 7 or later. To test java installation is complete, open command prompt type
java
and hit enter. If you receive a message'Java' is not recognized as an internal or external command.
You need to configure your environment variables,JAVA_HOME
andPATH
to point to the path of jdk. -
Download and install Scala.
Set
SCALA_HOME
inControl Panel\System and Security\System
goto "Adv System settings" and add%SCALA_HOME%\bin
in PATH variable in environment variables. -
Install Python 2.6 or later from Python Download link.
-
Download SBT. Install it and set
SBT_HOME
as an environment variable with value as<<SBT PATH>>
. -
Download
winutils.exe
from HortonWorks repo or git repo. Since we don't have a local Hadoop installation on Windows we have to downloadwinutils.exe
and place it in abin
directory under a createdHadoop
home directory. SetHADOOP_HOME = <<Hadoop home directory>>
in environment variable. -
We will be using a pre-built Spark package, so choose a Spark pre-built package for Hadoop Spark download. Download and extract it.
Set
SPARK_HOME
and add%SPARK_HOME%\bin
in PATH variable in environment variables. -
Run command:
spark-shell
-
Open
http://localhost:4040/
in a browser to see the SparkContext web UI.
I found the easiest solution on Windows is to build from source.
You can pretty much follow this guide: http://spark.apache.org/docs/latest/building-spark.html
Download and install Maven, and set MAVEN_OPTS
to the value specified in the guide.
But if you're just playing around with Spark, and don't actually need it to run on Windows for any other reason that your own machine is running Windows, I'd strongly suggest you install Spark on a linux virtual machine. The simplest way to get started probably is to download the ready-made images made by Cloudera or Hortonworks, and either use the bundled version of Spark, or install your own from source or the compiled binaries you can get from the spark website.