Pandas indexing behavior after grouping: do I see an "extra row"?

It's just an index name...

Demo:

In [46]: df
Out[46]:
   p_id  rating
0     1       5
1     1       3
2     1       2
3     2       2
4     3       5
5     3       1
6     3       3
7     4       4
8     4       5

In [47]: df.index.name = 'AAA'

pay attention at the index name: AAA

In [48]: df
Out[48]:
     p_id  rating
AAA
0       1       5
1       1       3
2       1       2
3       2       2
4       3       5
5       3       1
6       3       3
7       4       4
8       4       5

You can get rid of it using rename_axis() method:

In [42]: df[['p_id', 'rating']].groupby('p_id').count().rename_axis(None)
Out[42]:
   rating
1       3
2       1
3       3
4       2

There is no "extra row", it's simply how pandas visually renders a GroupBy object, i.e. how pandas.core.groupby.generic.DataFrameGroupBy.__str__ method renders a grouped dataframe object: rating is the column, but now p_id has now gone from being a column to being the (row) index.

Another reason they stagger them (i.e. the row with the column names, and the row with the index/multi-index name) is because the index can be a MultiIndex (if you grouped-by multiple columns).

Pandas indexing behavior after grouping: do I see an "extra row"?

Related

Recent Posts