Apache Spark logging within Scala
Solution 1:
You can use Akhil's solution proposed in
https://www.mail-archive.com/[email protected]/msg29010.html.
I have used by myself and it works.
Akhil Das Mon, 25 May 2015 08:20:40 -0700
Try this way:object Holder extends Serializable { @transient lazy val log = Logger.getLogger(getClass.getName) } val someRdd = spark.parallelize(List(1, 2, 3)).foreach { element => Holder.log.info(element) }
Solution 2:
Use Log4j 2.x. The core logger has been made serializable. Problem solved.
Jira discussion: https://issues.apache.org/jira/browse/LOG4J2-801
"org.apache.logging.log4j" % "log4j-api" % "2.x.x"
"org.apache.logging.log4j" % "log4j-core" % "2.x.x"
"org.apache.logging.log4j" %% "log4j-api-scala" % "2.x.x"
Solution 3:
If you need some code to be executed before and after a map
, filter
or other RDD
function, try to use mapPartition
, where the underlying iterator is passed explicitely.
Example:
val log = ??? // this gets captured and produced serialization error
rdd.map { x =>
log.info(x)
x+1
}
Becomes:
rdd.mapPartition { it =>
val log = ??? // this is freshly initialized in worker nodes
it.map { x =>
log.info(x)
x + 1
}
}
Every basic RDD
function is always implemented with a mapPartition
.
Make sure to handle the partitioner explicitly and not to loose it: see Scaladoc, preservesPartitioning
parameter, this is critical for performances.
Solution 4:
This is an old post but I want to provide my working solution which I just got after struggling a lot and still can be useful for others:
I want to print rdd contents inside rdd.map function but getting Task Not Serializalable Error
. This is my solution for this problem using scala static object which is extending java.io.Serializable
:
import org.apache.log4j.Level
object MyClass extends Serializable{
val log = org.apache.log4j.LogManager.getLogger("name of my spark log")
log.setLevel(Level.INFO)
def main(args:Array[String])
{
rdd.map(t=>
//Using object's logger here
val log =MyClass.log
log.INFO("count"+rdd.count)
)
}
}
Solution 5:
val log = Logger.getLogger(getClass.getName),
You can use "log" to write logs . Also if you need change logger properties you need to have log4j.properties in /conf folder. By default we will have a template in that location.