How to upsample a multi-index dataframe ensuring each grouping covers the same time range (provide custom starting and ending datetimes)

Solution 1:

I'm sure there is a better way, but here is one way to achieve this:

def my_upsample(df):
    # Get all periods
    years = df.index.get_level_values(1)
    years = pd.date_range(years.min(), years.max(), freq="as")

    # Reindex and format
    return (
        df.unstack(level=0)
        .reindex(years)
        .unstack()
        .reset_index((0, 1), name="members")
        .drop("level_0", axis=1)
    )

Output:

           country  members
1995-01-01      DE      NaN
1996-01-01      DE    300.0
1997-01-01      DE      NaN
1998-01-01      DE      NaN
1999-01-01      DE    301.0
2000-01-01      DE      NaN
1995-01-01      ES    100.0
1996-01-01      ES      NaN
1997-01-01      ES    101.0
1998-01-01      ES      NaN
1999-01-01      ES      NaN
2000-01-01      ES      NaN
1995-01-01      GB      NaN
1996-01-01      GB      NaN
1997-01-01      GB    200.0
1998-01-01      GB    201.0
1999-01-01      GB      NaN
2000-01-01      GB    202.0

Here's some explanation of each step:

  1. unstack(level=0): Pivot the index (the level=0 part sets "year" as the index, thus allowing the incoming reindexing)
  2. reindex(years): Reindex to the target date range. Notice in your specific example this wouldn't actually be required, since your sample already contains all years at least once;
  3. unstack(): Pivot yet again. As the index is not a MultiIndex, pivotting here will return a Series with a hierarchical index: "member" > "country" > "year". At this stage we're essentially done, just need to format this into the desired DataFrame;
  4. reset_index((0, 1), name='members'): leave only the "year" as index and rename the original Series to "members";
  5. drop('level_0', axis=1): drop the unwanted column