Why is join not possible after show operator?

The following code works fine until I add show after agg. Why is show not possible?

 val tempTableB = tableB.groupBy("idB")
  .agg(first("numB").as("numB")) //when I add a .show here, it doesn't work

 tableA.join(tempTableB, $"idA" === $"idB", "inner")
 .drop("idA", "numA").show

The error says:

error: overloaded method value join with alternatives:
  (right: org.apache.spark.sql.Dataset[_],joinExprs: org.apache.spark.sql.Column,joinType: String)org.apache.spark.sql.DataFrame <and>
  (right: org.apache.spark.sql.Dataset[_],usingColumns: Seq[String],joinType: String)org.apache.spark.sql.DataFrame
 cannot be applied to (Unit, org.apache.spark.sql.Column, String)
              tableA.join(tempTableB, $"idA" === $"idB", "inner")
                     ^

Why is this behaving this way?


Solution 1:

.show() is a function with, what we call in Scala, a side-effect. It prints to stdout and returns Unit(), just like println

Example:

val a  = Array(1,2,3).foreach(println)
a: Unit = ()

In scala, you can assume that everything is a function and will return something. In your case, Unit() is being returned and that's what's getting stored in tempTableB.

Solution 2:

As @philantrovert has already answered with much detailed explanation. So I shall not explain.

What you can do if you want to see whats in tempTableB then you can do so after it has been assigned as below.

 val tempTableB = tableB.groupBy("idB")
  .agg(first("numB").as("numB")) 

 tempTableB.show

 tableA.join(tempTableB, $"idA" === $"idB", "inner")
 .drop("idA", "numA").show

It should work then