pandas DataFrame: Get previous month value where there are missing transaction and cannot shift()
Is there any way to implement this without generating dummy row ?
Here is my data source.
Group Store Month Revenue
Group1 A 201611 10
Group1 A 201612 20
Group1 A 201701 30
Group1 B 201611 40
Group1 B 201701 60
Group2 C 201611 70
Group2 C 201612 80
Group2 C 201702 100
And here is desired output
Group Store Month Revenue Month_LM Revenue_LM
Group1 A 201611 10 201610
Group1 A 201612 20 201611 10
Group1 A 201701 30 201612 20
Group1 B 201611 50 201610
Group1 B 201701 70 201612
Group1 B 201702 80 201701 70
Group2 C 201611 90 201610
Group2 C 201612 100 201611 90
Group2 C 201702 120 201701
The problem is at B, C (please notice missing 201612 of B and 201701 of C) If I do shift(), I will get value of previous month (in transaction, but not in business logic)
I managed to get Month_LM by using
def get_lm(month):
d = datetime.strptime(month+"01","%Y%m%d")
d = d - relativedelta(months=1)
return d.strftime("%Y%m")
df['LM'] = df['MONTH'].apply(lambda x:get_lm(str(x)))
But I have no idea how to get Revenue of "Month" based on "Month_LM" value ? df.lookup maybe ?
Thank you.
Solution 1:
I change the Month to datetime format, you would like it back, you can do,df.Month.dt.year*100+df.Month.dt.month
, and in my solution I did not using column Month_LM
df.Month=pd.to_datetime(df.Month,format='%Y%m')
df['Rev']=df.groupby('Group').apply(lambda x :x.Revenue.shift()* (x.Month.dt.year*12+x.Month.dt.month).diff().eq(1)).replace(0,np.nan).values
df
Out[1080]:
Group Store Month Revenue Month_LM Rev
0 Group1 A 2016-11-01 10 201610 NaN
1 Group1 A 2016-12-01 20 201611 10.0
2 Group1 A 2017-01-01 30 201612 20.0
3 Group1 B 2016-11-01 50 201610 NaN
4 Group1 B 2017-01-01 70 201612 NaN
5 Group1 B 2017-02-01 80 201701 70.0
6 Group2 C 2016-11-01 90 201610 NaN
7 Group2 C 2016-12-01 100 201611 90.0
8 Group2 C 2017-02-01 120 201701 NaN