Using Spark-Submit to write to S3 in "local" mode using S3A Directory Committer

  1. you need the spark-hadoop-cloud module for the release of spark you are using
  2. the committer is happy using the local fs (it's now the public integration test suites work https://github.com/hortonworks-spark/cloud-integration. all that's needed is a "real" filesystem shared across all workers and the spark driver, so the driver gets the manifests of each pending commit.
  3. print the _SUCCESS file after a job to see what the committer did: 0 byte file == old committer, JSON with diagnostics == new one