How to create a sequence of timestamps in Scala
Solution 1:
For Spark 2.4+, you can use sequence
function for arrays by setting up the step parameter with interval 1 hour
:
val df = spark.sql("SELECT sequence(to_timestamp('2019-11-20 00:00:00'), to_timestamp('2019-11-25 23:00:00'), interval 1 hour) as Date")
df.printSchema()
//root
// |-- Date: array (nullable = true)
// | |-- element: timestamp (containsNull = false)
Now, just explode the array of timestamps to get the desired output:
df.withColumn("Date", explode($"Date")).show(5)
+-------------------+
| Date|
+-------------------+
|2019-11-20 00:00:00|
|2019-11-20 01:00:00|
|2019-11-20 02:00:00|
|2019-11-20 03:00:00|
|2019-11-20 04:00:00|
+-------------------+
Solution 2:
you can try this: input two String => Output Iterator of localDateTime
def dayIterator(start_ts: String, end_ts: String) = {
val format = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:SS")
val date_start = format.parse(start_ts)
val date_end = format.parse(end_ts)
val start = LocalDateTime.ofInstant(Instant.ofEpochMilli(date_start.getTime), ZoneId.systemDefault)
val last = LocalDateTime.ofInstant(Instant.ofEpochMilli(date_end.getTime), ZoneId.systemDefault)
Iterator.iterate(start)(_ plusHours 1) takeWhile (_ isEqual last)
}
And from this Iterator you can create the DataFrame