How to reduce the verbosity of Spark's runtime output?

How to reduce the amount of trace info the Spark runtime produces?

The default is too verbose,

How to turn off it, and turn on it when I need.

Thanks

Verbose mode

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output 
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Silent mode(expected)

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Solution 1:

Spark 1.4.1

sc.setLogLevel("WARN")

From comments in source code:

Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Spark 2.x - 2.3.1

sparkSession.sparkContext().setLogLevel("WARN")

Spark 2.3.2

sparkSession.sparkContext.setLogLevel("WARN")

Solution 2:

quoting from 'Learning Spark' book.

You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:

log4j.rootCategory=INFO, console

Then lower the log level so that we only show WARN message and above by changing it to the following:

log4j.rootCategory=WARN, console

When you re-open the shell, you should see less output.

Solution 3:

Logging configuration at the Spark app level

With this approach no need of code change in cluster for a spark application.

  • Let's create a new file log4j.properties from log4j.properties.template.
  • Then change verbosity with log4j.rootCategory property.
  • Say, we need to check ERRORs of given jar then, log4j.rootCategory=ERROR, console

Spark submit command would be

spark-submit \
    ... #Other spark props goes here    
    --files prop/file/location \
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    jar/location \
    [application arguments] 

Now you would see only the logs which are ERROR categorised.


Plain Log4j way wo Spark(but needs code change)

Set Logging OFF for packages org and akka

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)

Solution 4:

If you are invoking a command from a shell, there is a lot you can do without changing any configurations. That is by design.

Below are a couple of Unix examples using pipes, but you could do similar filters in other environments.

To completely silence the log (at your own risk)

Pipe stderr to /dev/null, i.e.:

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null

To ignore INFO messages

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 | awk '{if ($3 != "INFO") print $0}'