How do I create a sum row and sum column in pandas?
Or in two steps, using the .sum()
function as you suggested (which might be a bit more readable as well):
import pandas as pd
df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})
#Total sum per column:
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row:
df.loc[:,'Total'] = df.sum(axis=1)
Output:
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
append
and assign
The point of this answer is to provide an in line and not an in place solution.
append
I use append
to stack a Series
or DataFrame
vertically. It also creates a copy
so that I can continue to chain.
assign
I use assign
to add a column. However, the DataFrame
I'm working on is in the in between nether space. So I use a lambda
in the assign
argument which tells Pandas
to apply it to the calling DataFrame
.
df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
Fun alternative
Uses drop
with errors='ignore'
to get rid of potentially pre-existing Total
rows and columns.
Also, still in line.
def tc(d):
return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))
df.pipe(tc).T.pipe(tc).T
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
From the original data using crosstab
, if just base on your input, you just need melt
before crosstab
s=df.reset_index().melt('index')
pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
Out[33]:
variable Graduate Undergraduate All
index
Not 440 3760 4200
Straight A's 60 240 300
All 500 4000 4500
Toy data
df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]})
# before `agg`, I think your input is the result after `groupby`
df
Out[37]:
c1 c2 c3
0 1 2 1
1 2 2 2
2 2 3 3
3 3 3 4
4 4 3 5
pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
=True)
Out[38]:
c2 2 3 All
c1
1 1.0 NaN 1
2 2.0 3.0 5
3 NaN 4.0 4
4 NaN 5.0 5
All 3.0 12.0 15
The original data is:
>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
>>> df
Out:
Graduate Undergraduate
Straight A's 60 240
Not 440 3760
You can only use df.T
to achieve recreating this table:
>>> df_new = df.T
>>> df_new
Out:
Straight A's Not
Graduate 60 440
Undergraduate 240 3760
After computing the Total
by row and columns:
>>> df_new.loc['Total',:]= df_new.sum(axis=0)
>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
>>> df_new
Out:
Straight A's Not Total
Graduate 60.0 440.0 500.0
Undergraduate 240.0 3760.0 4000.0
Total 300.0 4200.0 4500.0