How to create a sequence of timestamps in Scala

Solution 1:

For Spark 2.4+, you can use sequence function for arrays by setting up the step parameter with interval 1 hour :

val df = spark.sql("SELECT sequence(to_timestamp('2019-11-20 00:00:00'), to_timestamp('2019-11-25 23:00:00'), interval 1 hour) as Date")

df.printSchema()

//root
// |-- Date: array (nullable = true)
// |    |-- element: timestamp (containsNull = false)

Now, just explode the array of timestamps to get the desired output:

df.withColumn("Date", explode($"Date")).show(5)

+-------------------+
|               Date|
+-------------------+
|2019-11-20 00:00:00|
|2019-11-20 01:00:00|
|2019-11-20 02:00:00|
|2019-11-20 03:00:00|
|2019-11-20 04:00:00|
+-------------------+

Solution 2:

you can try this: input two String => Output Iterator of localDateTime

def dayIterator(start_ts: String, end_ts: String) = {
            val format = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:SS")
            val date_start = format.parse(start_ts)
            val date_end = format.parse(end_ts)
            val start = LocalDateTime.ofInstant(Instant.ofEpochMilli(date_start.getTime), ZoneId.systemDefault)
            val last = LocalDateTime.ofInstant(Instant.ofEpochMilli(date_end.getTime), ZoneId.systemDefault)

            Iterator.iterate(start)(_ plusHours 1) takeWhile (_ isEqual last)
        }

And from this Iterator you can create the DataFrame