Pandas groupby for zero values
Solution 1:
You can use this:
df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print (df)
Output:
Action
Symbol Year
AAPL 2001 2
2002 0
BAC 2001 0
2002 2
Solution 2:
You can use pivot_table
with unstack
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
dtype: int64
If you need output as DataFrame
use to_frame
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
.to_frame()
.rename(columns={0:'Action'})
Action
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
Solution 3:
Datatype category
Maybe this feature didn't exist back when this thread was opened, however the datatype "category" can help here:
# create a dataframe with one combination of a,b missing
df = pd.DataFrame({"a":[0,1,1], "b": [0,1,0]})
df = df.astype({"a":"category", "b":"category"})
print(df)
Dataframe looks like this:
a b
0 0 0
1 1 1
2 1 0
And now, grouping by a and b
print(df.groupby(["a","b"]).size())
yields:
a b
0 0 1
1 0
1 0 1
1 1
Note the 0 in the rightmost column. This behavior is also documented in the pandas userguide (search on page for "groupby").