Find the number of days since a max value
You can use a double groupby
. The important part is to compute a new group to put together the rows that are lower than the last max. Once you have done that this is a simple cumsum
per group:
g = df.groupby(df['player'])
# date diff per group (days)
diff = g['match_date'].diff().dt.days
# group per lower than last max
group = df['stat'].ge(g['stat'].cummax()).shift().cumsum()
# days since last max
df['dsms'] = diff.groupby([df['player'], group]).cumsum()
Output:
id player match_date stat dsms
0 1 1 2022-01-01 1500 NaN
1 2 1 2022-01-03 1600 2.0
2 3 1 2022-01-10 2100 7.0
3 4 1 2022-01-11 1800 1.0
4 5 1 2022-01-18 1700 8.0
5 6 2 2022-01-01 1600 NaN
6 7 2 2022-01-03 1800 2.0
7 8 2 2022-01-10 1600 7.0
8 9 2 2022-01-11 1900 8.0
First imagine you have only one id
, then you can use expanding
to find the cummulative max/idxmax. then you can subtract:
def day_since_max(data):
maxIdx = data['stat'].expanding().apply(pd.Series.idxmax)
date_at_max = data.loc[maxIdx, 'match_date'].shift()
return data['match_date'] - date_at_max.values
Now, we can use groupby().apply
to apply that function for each id
:
df['days_since_max'] = df.groupby('player').apply(day_since_max).reset_index(level=0, drop=True)
Output:
id player match_date stat days_since_max
0 1 1 2022-01-01 1500 NaT
1 2 1 2022-01-03 1600 2 days
2 3 1 2022-01-10 2100 7 days
3 4 1 2022-01-11 1800 1 days
4 5 1 2022-01-18 1700 8 days
5 6 2 2022-01-01 1600 NaT
6 7 2 2022-01-03 1800 2 days
7 8 2 2022-01-10 1600 7 days
8 9 2 2022-01-11 1900 8 days
9 10 2 2022-01-18 1500 7 days