Group by returns empty dataframe and no error

pandas

The reason is obviously that in all groups created by all 4 columns is at least one NA value. Therefore these groups are excluded and the result is empty. If you take less than 4 columns this condition is obviously not met for your actual data.

See the docs on missing values:

NA groups in GroupBy are automatically excluded.

Example:

>>> df = pd.DataFrame({'a':[None,1,2], 'b':[1,None,2], 'c': [1,2,None], 'd': [1,1,1]})
>>> df
     a    b    c  d
0  NaN  1.0  1.0  1
1  1.0  NaN  2.0  1
2  2.0  2.0  NaN  1
>>> df.groupby(['a', 'b']).d.sum()
a    b  
2.0  2.0    1
Name: d, dtype: int64
>>> df.groupby(['a', 'c']).d.sum()
a    c  
1.0  2.0    1
Name: d, dtype: int64
>>> df.groupby(['b', 'c']).d.sum()
b    c  
1.0  1.0    1
Name: d, dtype: int64
>>> df.groupby(['a', 'b', 'c']).d.sum()
Series([], Name: d, dtype: int64)

Version 1.1.0 will have a dropna parameter in groupby to handle this kind of cases. You can set it to False to include NA values in groupby keys (default is True for backward compability), see https://github.com/pandas-dev/pandas/pull/30584.

Group by returns empty dataframe and no error

Related

Recent Posts