How to reduce the verbosity of Spark's runtime output?
How to reduce the amount of trace info the Spark runtime produces?
The default is too verbose,
How to turn off it, and turn on it when I need.
Thanks
Verbose mode
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
Silent mode(expected)
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
Solution 1:
Spark 1.4.1
sc.setLogLevel("WARN")
From comments in source code:
Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
Spark 2.x - 2.3.1
sparkSession.sparkContext().setLogLevel("WARN")
Spark 2.3.2
sparkSession.sparkContext.setLogLevel("WARN")
Solution 2:
quoting from 'Learning Spark' book.
You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:
log4j.rootCategory=INFO, console
Then lower the log level so that we only show WARN message and above by changing it to the following:
log4j.rootCategory=WARN, console
When you re-open the shell, you should see less output.
Solution 3:
Logging configuration at the Spark app level
With this approach no need of code change in cluster for a spark application.
- Let's create a new file log4j.properties from log4j.properties.template.
- Then change verbosity with
log4j.rootCategory
property. - Say, we need to check ERRORs of given jar then,
log4j.rootCategory=ERROR, console
Spark submit command would be
spark-submit \
... #Other spark props goes here
--files prop/file/location \
--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
jar/location \
[application arguments]
Now you would see only the logs which are ERROR categorised.
Plain Log4j way wo Spark(but needs code change)
Set Logging OFF for packages org
and akka
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
Solution 4:
If you are invoking a command from a shell, there is a lot you can do without changing any configurations. That is by design.
Below are a couple of Unix examples using pipes, but you could do similar filters in other environments.
To completely silence the log (at your own risk)
Pipe stderr to /dev/null
, i.e.:
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null
To ignore INFO
messages
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 | awk '{if ($3 != "INFO") print $0}'