Way to use groupby over indices in pandas

[Edited to provide better working example data]

Imagine I have a panel in pandas with multiple time_vars for each id_var. For example, imagine the data contain a state variable and a year variable:

                   var1      var2
stcode year                      
WY     1996-01-01    14  0.000059
       1995-01-01     8  0.000059
       1994-01-01    21  0.000182
       1993-01-01    17  0.000063
       1992-01-01     9  0.000000
AK     1964-01-01    11  0.000213
       1965-01-01     6  0.000100
       1966-01-01    10  0.000189
       1967-01-01     9  0.000267
       1968-01-01     9  0.000084

Is there a way to use pd.groupby and do operations on the year level of the multiindex as the object or am I constrained to the columns defined within the groupby variable?

As an example, suppose I wanted to check if the dates were sorted, but I only cared about whether the year level was sorted, and not the stcode level. I currently would implement this like so:

# unsetting year from the index, so that I can groupby the remaining level and 
# check if year is monotonically increasing
df.reset_index('year').groupby('stcode').year.is_monotonic_increasing

This returns (as you can see):

stcode
AK     True
WY    False
Name: year, dtype: bool

However, unsetting the index in this way seems awfully inefficient. Is there a general way to do somthing like:

df.groupby(level=0)[ -- the other level -- ].apply( -- operation -- )

Similarly, and relatedly, is there a way to sort by the second level and ignore the first? In this example, I may need the time variable sorted within each stcode block to do time dependent operations like shifting, but I do not care if the outer level is sorted alphabetically or numerically. In other words, this would end up like:

                   var1      var2
stcode year                      
WY     1992-01-01     9  0.000000
       1993-01-01    17  0.000063
       1994-01-01    21  0.000182
       1995-01-01     8  0.000059
       1996-01-01    14  0.000059
AK     1964-01-01    11  0.000213
       1965-01-01     6  0.000100
       1966-01-01    10  0.000189
       1967-01-01     9  0.000267
       1968-01-01     9  0.000084

Here is a dict of the original DataFrame:

{'var1': {('WY', Timestamp('1996-01-01 00:00:00')): 14,
  ('WY', Timestamp('1995-01-01 00:00:00')): 8,
  ('WY', Timestamp('1994-01-01 00:00:00')): 21,
  ('WY', Timestamp('1993-01-01 00:00:00')): 17,
  ('WY', Timestamp('1992-01-01 00:00:00')): 9,
  ('AK', Timestamp('1964-01-01 00:00:00')): 11,
  ('AK', Timestamp('1965-01-01 00:00:00')): 6,
  ('AK', Timestamp('1966-01-01 00:00:00')): 10,
  ('AK', Timestamp('1967-01-01 00:00:00')): 9,
  ('AK', Timestamp('1968-01-01 00:00:00')): 9},
 'var2': {('WY', Timestamp('1996-01-01 00:00:00')): 5.855486597283743e-05,
  ('WY', Timestamp('1995-01-01 00:00:00')): 5.91261159570422e-05,
  ('WY', Timestamp('1994-01-01 00:00:00')): 0.00018243736121803522,
  ('WY', Timestamp('1993-01-01 00:00:00')): 6.34477473795414e-05,
  ('WY', Timestamp('1992-01-01 00:00:00')): 0.0,
  ('AK', Timestamp('1964-01-01 00:00:00')): 0.0002131750516127795,
  ('AK', Timestamp('1965-01-01 00:00:00')): 0.00010040206689154729,
  ('AK', Timestamp('1966-01-01 00:00:00')): 0.0001891511055873707,
  ('AK', Timestamp('1967-01-01 00:00:00')): 0.00026726152282208204,
  ('AK', Timestamp('1968-01-01 00:00:00')): 8.391729352297261e-05}}

Solution 1:

I personally find the approach you used clean and reasonable.

However, you could use:

df.groupby(level=0).apply(lambda g: g.index.is_monotonic_increasing)

as the first level will necessarily be monotonic increasing per group.

Or, to really drop the first level, if needed (not required here):

df.groupby(level=0).apply(lambda g: g.index.droplevel(0).is_monotonic_increasing)

Output:

AK     True
WY    False
dtype: bool

sorting only the second level

Use groupby with group_keys=False, sort=False:

df.groupby(level=0, group_keys=False, sort=False).apply(lambda g: g.sort_index())

Output:

               var1      var2
WY 1992-01-01     9  0.000000
   1993-01-01    17  0.000063
   1994-01-01    21  0.000182
   1995-01-01     8  0.000059
   1996-01-01    14  0.000059
AK 1964-01-01    11  0.000213
   1965-01-01     6  0.000100
   1966-01-01    10  0.000189
   1967-01-01     9  0.000267
   1968-01-01     9  0.000084

Way to use groupby over indices in pandas

Solution 1:

sorting only the second level

Related

Recent Posts