How to upsample a multi-index dataframe ensuring each grouping covers the same time range (provide custom starting and ending datetimes)
Solution 1:
I'm sure there is a better way, but here is one way to achieve this:
def my_upsample(df):
# Get all periods
years = df.index.get_level_values(1)
years = pd.date_range(years.min(), years.max(), freq="as")
# Reindex and format
return (
df.unstack(level=0)
.reindex(years)
.unstack()
.reset_index((0, 1), name="members")
.drop("level_0", axis=1)
)
Output:
country members
1995-01-01 DE NaN
1996-01-01 DE 300.0
1997-01-01 DE NaN
1998-01-01 DE NaN
1999-01-01 DE 301.0
2000-01-01 DE NaN
1995-01-01 ES 100.0
1996-01-01 ES NaN
1997-01-01 ES 101.0
1998-01-01 ES NaN
1999-01-01 ES NaN
2000-01-01 ES NaN
1995-01-01 GB NaN
1996-01-01 GB NaN
1997-01-01 GB 200.0
1998-01-01 GB 201.0
1999-01-01 GB NaN
2000-01-01 GB 202.0
Here's some explanation of each step:
-
unstack(level=0)
: Pivot the index (thelevel=0
part sets "year" as the index, thus allowing the incoming reindexing) -
reindex(years)
: Reindex to the target date range. Notice in your specific example this wouldn't actually be required, since your sample already contains all years at least once; -
unstack()
: Pivot yet again. As the index is not aMultiIndex
, pivotting here will return aSeries
with a hierarchical index: "member" > "country" > "year". At this stage we're essentially done, just need to format this into the desiredDataFrame
; -
reset_index((0, 1), name='members')
: leave only the "year" as index and rename the originalSeries
to "members"; -
drop('level_0', axis=1)
: drop the unwanted column