How to assign a name to the size() column?

I am using .size() on a groupby result in order to count how many items are in each group.

I would like the result to be saved to a new column name without manually editing the column names array, how can it be done?

This is what I have tried:

grpd = df.groupby(['A','B'])
grpd['size'] = grpd.size()
grpd

and the error I got:

TypeError: 'DataFrameGroupBy' object does not support item assignment (on the second line)


Solution 1:

The .size() built-in method of DataFrameGroupBy objects actually returns a Series object with the group sizes and not a DataFrame. If you want a DataFrame whose column is the group sizes, indexed by the groups, with a custom name, you can use the .to_frame() method and use the desired column name as its argument.

grpd = df.groupby(['A','B']).size().to_frame('size')

If you wanted the groups to be columns again you could add a .reset_index() at the end.

Solution 2:

You need transform size - len of df is same as before:

Notice:

Here it is necessary to add one column after groupby, else you get an error. Because GroupBy.size count NaNs too, what column is used is not important. All columns working same.

import pandas as pd

df = pd.DataFrame({'A': ['x', 'x', 'x','y','y']
                , 'B': ['a', 'c', 'c','b','b']})
print (df)
   A  B
0  x  a
1  x  c
2  x  c
3  y  b
4  y  b

df['size'] = df.groupby(['A', 'B'])['A'].transform('size')
print (df)
   A  B  size
0  x  a     1
1  x  c     2
2  x  c     2
3  y  b     2
4  y  b     2

If need set column name in aggregating df - len of df is obviously NOT same as before:

import pandas as pd

df = pd.DataFrame({'A': ['x', 'x', 'x','y','y']
                , 'B': ['a', 'c', 'c','b','b']})
print (df)
   A  B
0  x  a
1  x  c
2  x  c
3  y  b
4  y  b

df = df.groupby(['A', 'B']).size().reset_index(name='Size')
print (df)
   A  B  Size
0  x  a     1
1  x  c     2
2  y  b     2