Group by returns empty dataframe and no error
The reason is obviously that in all groups created by all 4 columns is at least one NA
value. Therefore these groups are excluded and the result is empty. If you take less than 4 columns this condition is obviously not met for your actual data.
See the docs on missing values:
NA groups in GroupBy are automatically excluded.
Example:
>>> df = pd.DataFrame({'a':[None,1,2], 'b':[1,None,2], 'c': [1,2,None], 'd': [1,1,1]})
>>> df
a b c d
0 NaN 1.0 1.0 1
1 1.0 NaN 2.0 1
2 2.0 2.0 NaN 1
>>> df.groupby(['a', 'b']).d.sum()
a b
2.0 2.0 1
Name: d, dtype: int64
>>> df.groupby(['a', 'c']).d.sum()
a c
1.0 2.0 1
Name: d, dtype: int64
>>> df.groupby(['b', 'c']).d.sum()
b c
1.0 1.0 1
Name: d, dtype: int64
>>> df.groupby(['a', 'b', 'c']).d.sum()
Series([], Name: d, dtype: int64)
Version 1.1.0 will have a dropna
parameter in groupby
to handle this kind of cases. You can set it to False
to include NA
values in groupby keys (default is True
for backward compability), see https://github.com/pandas-dev/pandas/pull/30584.