Column name with dot spark
Solution 1:
If your problem is the .(dot)
in the column name, you could use `(backticks)
to enclose the column name.
df.select("`col0.1`")
Solution 2:
The problem here is VectorAssembler
implementation, not the columns per se. You can for example skip the header:
val df = spark.read.format("csv")
.options(Map("inferSchema" -> "true", "comment" -> "\""))
.load(path)
new VectorAssembler()
.setInputCols(df.columns)
.setOutputCol("vs")
.transform(df)
or rename columns before passing to VectorAssembler
:
val renamed = df.toDF(df.columns.map(_.replace(".", "_")): _*)
new VectorAssembler()
.setInputCols(renamed.columns)
.setOutputCol("vs")
.transform(renamed)
Finally the best approach is to provide schema explicitly:
import org.apache.spark.sql.types._
val schema = StructType((0 until 4).map(i => StructField(s"_$i", DoubleType)))
val dfExplicit = spark.read.format("csv")
.options(Map("header" -> "true"))
.schema(schema)
.load(path)
new VectorAssembler()
.setInputCols(dfExplicit.columns)
.setOutputCol("vs")
.transform(dfExplicit)