Custom month range with current date in window function
Solution 1:
You don't need Window in this case. Simply group by id
and use conditional sum aggregation:
from pyspark.sql import functions as F
df = spark.createDataFrame([
(1, "2021-10-01", 5), (1, "2021-11-01", 6),
(1, "2021-09-01", 10), (2, "2021-12-01", 9)
], ["Id", "date", "price"])
nb_last_months = 2
df1 = df.groupBy("id").agg(
F.sum(
F.when(
F.col("date") >= F.add_months(F.date_trunc("month", F.current_date()), - nb_last_months),
F.col("price")
)
).alias(f"sum_last_{nb_last_months}_months")
)
df1.show()
#+---+-----------------+
#| id|sum_last_2_months|
#+---+-----------------+
#| 1| 6|
#| 2| 9|
#+---+-----------------+