How to set up Spark on Windows?

Steps to install Spark in local mode:

  1. Install Java 7 or later. To test java installation is complete, open command prompt type java and hit enter. If you receive a message 'Java' is not recognized as an internal or external command. You need to configure your environment variables, JAVA_HOME and PATH to point to the path of jdk.

  2. Download and install Scala.

    Set SCALA_HOME in Control Panel\System and Security\System goto "Adv System settings" and add %SCALA_HOME%\bin in PATH variable in environment variables.

  3. Install Python 2.6 or later from Python Download link.

  4. Download SBT. Install it and set SBT_HOME as an environment variable with value as <<SBT PATH>>.

  5. Download winutils.exe from HortonWorks repo or git repo. Since we don't have a local Hadoop installation on Windows we have to download winutils.exe and place it in a bin directory under a created Hadoop home directory. Set HADOOP_HOME = <<Hadoop home directory>> in environment variable.

  6. We will be using a pre-built Spark package, so choose a Spark pre-built package for Hadoop Spark download. Download and extract it.

    Set SPARK_HOME and add %SPARK_HOME%\bin in PATH variable in environment variables.

  7. Run command: spark-shell

  8. Open http://localhost:4040/ in a browser to see the SparkContext web UI.


I found the easiest solution on Windows is to build from source.

You can pretty much follow this guide: http://spark.apache.org/docs/latest/building-spark.html

Download and install Maven, and set MAVEN_OPTS to the value specified in the guide.

But if you're just playing around with Spark, and don't actually need it to run on Windows for any other reason that your own machine is running Windows, I'd strongly suggest you install Spark on a linux virtual machine. The simplest way to get started probably is to download the ready-made images made by Cloudera or Hortonworks, and either use the bundled version of Spark, or install your own from source or the compiled binaries you can get from the spark website.