Convert date from String to Date format in Dataframes

I am trying to convert a column which is in String format to Date format using the to_date function but its returning Null values.

df.createOrReplaceTempView("incidents")
spark.sql("select Date from incidents").show()

+----------+
|      Date|
+----------+
|08/26/2016|
|08/26/2016|
|08/26/2016|
|06/14/2016|

spark.sql("select to_date(Date) from incidents").show()

+---------------------------+
|to_date(CAST(Date AS DATE))|
 +---------------------------+
|                       null|
|                       null|
|                       null|
|                       null|

The Date column is in String format:

 |-- Date: string (nullable = true)

Use to_date with Java SimpleDateFormat.

TO_DATE(CAST(UNIX_TIMESTAMP(date, 'MM/dd/yyyy') AS TIMESTAMP))

Example:

spark.sql("""
  SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate"""
).show()

+----------+
|        dt|
+----------+
|2016-08-26|
+----------+

I solved the same problem without the temp table/view and with dataframe functions.

Of course I found that only one format works with this solution and that's yyyy-MM-DD.

For example:

val df = sc.parallelize(Seq("2016-08-26")).toDF("Id")
val df2 = df.withColumn("Timestamp", (col("Id").cast("timestamp")))
val df3 = df2.withColumn("Date", (col("Id").cast("date")))

df3.printSchema

root
 |-- Id: string (nullable = true)
 |-- Timestamp: timestamp (nullable = true)
 |-- Date: date (nullable = true)

df3.show

+----------+--------------------+----------+
|        Id|           Timestamp|      Date|
+----------+--------------------+----------+
|2016-08-26|2016-08-26 00:00:...|2016-08-26|
+----------+--------------------+----------+

The timestamp of course has 00:00:00.0 as a time value.


Since your main aim was to convert the type of a column in a DataFrame from String to Timestamp, I think this approach would be better.

import org.apache.spark.sql.functions.{to_date, to_timestamp}
val modifiedDF = DF.withColumn("Date", to_date($"Date", "MM/dd/yyyy"))

You could also use to_timestamp (I think this is available from Spark 2.x) if you require fine grained timestamp.


you can also do this query...!

sqlContext.sql("""
select from_unixtime(unix_timestamp('08/26/2016', 'MM/dd/yyyy'), 'yyyy:MM:dd') as new_format
""").show()

enter image description here