Filtering a spark dataframe based on date

The following solutions are applicable since spark 1.5 :

For lower than :

// filter data where the date is lesser than 2015-03-14

For greater than :

// filter data where the date is greater than 2015-03-14

For equality, you can use either equalTo or === :

data.filter(data("date") === lit("2015-03-14"))

If your DataFrame date column is of type StringType, you can convert it using the to_date function :

// filter data where the date is greater than 2015-03-14

You can also filter according to a year using the year function :

// filter data where year is greater or equal to 2016

Don't use this as suggested in other answers

.filter(f.col("dateColumn") < f.lit('2017-11-01'))

But use this instead

.filter(f.col("dateColumn") < f.unix_timestamp(f.lit('2017-11-01 00:00:00')).cast('timestamp'))

This will use the TimestampType instead of the StringType, which will be more performant in some cases. For example Parquet predicate pushdown will only work with the latter.