How to resample including the last value of the previous resampled set?
I need to compute different measures of performance from prices at different time resolutions e.g., yearly or monthly. An ideal approach is to use Pandas' resample but I also need a way to pass the last value of the previous resampling set to the next i.e. due to intro-resampling set return calculations. This is apparently not supported by doing resample('M')
:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(1, 61), index=pd.date_range('2021-12-31', '2022-02-28'), columns=['price'])
print(df)
df.resample('M').apply(lambda x: (x[0], x[-1]))
I get the following result:
price
2021-12-31 (1, 1)
2022-01-31 (2, 32)
2022-02-28 (33, 60)
I would like to instead get the following where the first element of the resampling set includes the last element from the previous resampling set:
price
2021-12-31 (1, 1)
2022-01-31 (1, 32)
2022-02-28 (32, 60)
You cannot easily access the other groups from within a group.
But an easy way is to shift
and concat
the output:
d = df.resample('M').last()
pd.concat([d.shift().bfill(), d], axis=1).astype(int).apply(tuple, axis=1)
output:
2021-12-31 (1, 1)
2022-01-31 (1, 32)
2022-02-28 (32, 60)
Freq: M, dtype: object
Or if you have multiple columns:
(pd.concat([d.shift().bfill(), d], axis=1).astype(int)
.groupby(level=0, axis=1, sort=False)
.apply(lambda d: d.apply(tuple, axis=1))
)
output:
price
2021-12-31 (1, 1)
2022-01-31 (1, 32)
2022-02-28 (32, 60)
with another example:
df = pd.DataFrame({'price': np.arange(1, 61), 'other': np.arange(1, 61)*10},
index=pd.date_range('2021-12-31', '2022-02-28'),
)
s = df.resample('M').last()
(pd.concat([s.shift().bfill(), s], axis=1).astype(int)
.groupby(level=0, axis=1, sort=False)
.apply(lambda d: d.apply(tuple, axis=1))
)
# price other
# 2021-12-31 (1, 1) (10, 10)
# 2022-01-31 (1, 32) (10, 320)
# 2022-02-28 (32, 60) (320, 600)