how to creat spark dataframe from a Map(string,any) scala?

schema = val schema = StructType(List(
                StructField("col1", IntegerType, nullable = true),
                StructField("col2", DoubleType, nullable = true),
              
            ))
val empty_df = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
val tempFillMap:Map[String,Any]= Map("col1"->3,"col2"->4.0)

how can i create or update the dataframe from the tempFillMap?


Solution 1:

If map values have simple types (children of AnyVal), schema can be constructed from values types, and Dataframe created with such schema:

def getSparkType(value: Any): DataType = value match {
  case _: Int => IntegerType
  case _: Double => DoubleType
  // TODO include other types here
  case _ => throw new IllegalArgumentException(s"Value $value type cannot be converted to Spark type!")
}

val schema = tempFillMap.map(kv => StructField(kv._1, getSparkType(kv._2)))
import collection.JavaConverters._
val df = spark.createDataFrame(List(Row(tempFillMap.values.toSeq: _*)).asJava, StructType(schema.toArray))

Result:

+----+----+
|col1|col2|
+----+----+
|3   |4.0 |
+----+----+

If schema is predefined, and map contains values for all fields, map values can be put in correct order, and added as Row:

val valuesInCorrectOrder = schema.fieldNames.map(name => tempFillMap(name))
val df = spark.createDataFrame(List(Row(valuesInCorrectOrder: _*)).asJava, schema)