How to use Column.isin with list?
val items = List("a", "b", "c")
sqlContext.sql("select c1 from table")
.filter($"c1".isin(items))
.collect
.foreach(println)
The code above throws the following exception.
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(a, b, c)
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49)
at org.apache.spark.sql.functions$.lit(functions.scala:89)
at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642)
at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.Column.isin(Column.scala:642)
Below is my attempt to fix it. It compiles and runs but doesn't return any match. Not sure why.
val items = List("a", "b", "c").mkString("\"","\",\"","\"")
sqlContext.sql("select c1 from table")
.filter($"c1".isin(items))
.collect
.foreach(println)
Solution 1:
According to documentation, isin
takes a vararg, not a list. List is actually a confusing name here. You can try converting your List to vararg like this:
val items = List("a", "b", "c")
sqlContext.sql("select c1 from table")
.filter($"c1".isin(items:_*))
.collect
.foreach(println)
Your variant with mkString compiles, because one single String is also a vararg (with number of arguments equal to 1), but it is proably not what you want to achieve.
Solution 2:
It worked like this in Java Api (Java 8)
.isin(sampleListName.stream().toArray(String[]::new))));
sampleListName is a List