Why is join not possible after show operator?
The following code works fine until I add show
after agg
. Why is show
not possible?
val tempTableB = tableB.groupBy("idB")
.agg(first("numB").as("numB")) //when I add a .show here, it doesn't work
tableA.join(tempTableB, $"idA" === $"idB", "inner")
.drop("idA", "numA").show
The error says:
error: overloaded method value join with alternatives:
(right: org.apache.spark.sql.Dataset[_],joinExprs: org.apache.spark.sql.Column,joinType: String)org.apache.spark.sql.DataFrame <and>
(right: org.apache.spark.sql.Dataset[_],usingColumns: Seq[String],joinType: String)org.apache.spark.sql.DataFrame
cannot be applied to (Unit, org.apache.spark.sql.Column, String)
tableA.join(tempTableB, $"idA" === $"idB", "inner")
^
Why is this behaving this way?
Solution 1:
.show()
is a function with, what we call in Scala, a side-effect. It prints to stdout and returns Unit()
, just like println
Example:
val a = Array(1,2,3).foreach(println)
a: Unit = ()
In scala, you can assume that everything is a function and will return something. In your case, Unit()
is being returned and that's what's getting stored in tempTableB
.
Solution 2:
As @philantrovert has already answered with much detailed explanation. So I shall not explain.
What you can do if you want to see whats in tempTableB then you can do so after it has been assigned as below.
val tempTableB = tableB.groupBy("idB")
.agg(first("numB").as("numB"))
tempTableB.show
tableA.join(tempTableB, $"idA" === $"idB", "inner")
.drop("idA", "numA").show
It should work then