How to calculate rest of the amount after comparing current date in pyspark dataframe?
I need to calculate how much I have in my account after today. That means, for the current day how much I left in my original Total_salary.
Below is my sample data set.
start_date end_date duration(months) Total_salary left_amount
2021-05-03 2022-05-03 12 1200 400
2019-01-01 2023-01-01 48 4800 2300
2018-01-01 2020-01-01 24 2400 0
2020-01-01 2023-01-01 36 3600 1200
2024-01-01 2027-01-01 36 3600 3600
I need get the upto current date how much I left, if end_date < current date. Let take first row as an example, I agree with a client for working for 12 months with total salary 1200, by each month I will receive 100 as my salary. So, I need to know today how much I left from my original total_salary. (100*8 = 800, 1200-800 = 400) I don't know how to get SUM up to current date.
I need to implement this in pyspark. Please anyone can help me to sort out this?
Thank you
import datetime
from pyspark.sql import functions as F
current_date = datetime.date.today()
(
df
.withColumn('left_months', F.greatest(F.lit(0), F.months_between('end_date', F.lit(current_date))))
.withColumn('left_amount', F.col('total_salary')/F.col('duration(months)') * F.col('left_months'))
.withColumn('left_amount', F.least('total_salary', 'left_amount'))
)