How do I check for equality using Spark Dataframe without SQL Query?
I had the same issue, and the following syntax worked for me:
df.filter(df("state")==="TX").show()
I'm using Spark 1.6.
There is another simple sql like option. With Spark 1.6 below also should work.
df.filter("state = 'TX'")
This is a new way of specifying sql like filters. For a full list of supported operators, check out this class.
You should be using where
, select
is a projection that returns the output of the statement, thus why you get boolean values. where
is a filter that keeps the structure of the dataframe, but only keeps data where the filter works.
Along the same line though, per the documentation, you can write this in 3 different ways
// The following are equivalent:
peopleDf.filter($"age" > 15)
peopleDf.where($"age" > 15)
peopleDf($"age" > 15)