How to print a groupby object
I want to print the result of grouping with Pandas.
I have a dataframe:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
print(df)
A B
0 one 0
1 one 1
2 two 2
3 three 3
4 three 4
5 one 5
When printing after grouping by 'A' I have the following:
print(df.groupby('A'))
<pandas.core.groupby.DataFrameGroupBy object at 0x05416E90>
How can I print the dataframe grouped?
If I do:
print(df.groupby('A').head())
I obtain the dataframe as if it was not grouped:
A B
A
one 0 one 0
1 one 1
two 2 two 2
three 3 three 3
4 three 4
one 5 one 5
I was expecting something like:
A B
A
one 0 one 0
1 one 1
5 one 5
two 2 two 2
three 3 three 3
4 three 4
Simply do:
grouped_df = df.groupby('A')
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
Deprecation Notice:
ix
was deprecated in 0.20.0
This also works,
grouped_df = df.groupby('A')
gb = grouped_df.groups
for key, values in gb.iteritems():
print(df.ix[values], "\n\n")
For selective key grouping: Insert the keys you want inside the key_list_from_gb
, in following, using gb.keys()
: For Example,
gb = grouped_df.groups
gb.keys()
key_list_from_gb = [key1, key2, key3]
for key, values in gb.items():
if key in key_list_from_gb:
print(df.ix[values], "\n")
If you're simply looking for a way to display it, you could use describe():
grp = df.groupby['colName']
grp.describe()
This gives you a neat table.
In addition to previous answers:
Taking your example,
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
Then simple 1 line code
df.groupby('A').apply(print)
In Jupyter Notebook, if you do the following, it prints a nice grouped version of the object. The apply
method helps in creation of a multiindex dataframe.
by = 'A' # groupby 'by' argument
df.groupby(by).apply(lambda a: a[:])
Output:
A B
A
one 0 one 0
1 one 1
5 one 5
three 3 three 3
4 three 4
two 2 two 2
If you want the by
column(s) to not appear in the output, just drop the column(s), like so.
df.groupby(by).apply(lambda a: a.drop(by, axis=1)[:])
Output:
B
A
one 0 0
1 1
5 5
three 3 3
4 4
two 2 2
Here, I am not sure as to why .iloc[:]
does not work instead of [:]
at the end. So, if there are some issues in future due to updates (or at present), .iloc[:len(a)]
also works.