Pandas: sum up multiple columns into one column without last column
If I have a dataframe similar to this one
Apples Bananas Grapes Kiwis
2 3 nan 1
1 3 7 nan
nan nan 2 3
I would like to add a column like this
Apples Bananas Grapes Kiwis Fruit Total
2 3 nan 1 6
1 3 7 nan 11
nan nan 2 3 5
I guess you could use df['Apples'] + df['Bananas']
and so on, but my actual dataframe is much larger than this. I was hoping a formula like df['Fruit Total']=df[-4:-1].sum
could do the trick in one line of code. That didn't work however. Is there any way to do it without explicitly summing up all columns?
Solution 1:
You can first select by iloc
and then sum
:
df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
print (df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 2.0
For sum all columns use:
df['Fruit Total']= df.sum(axis=1)
Solution 2:
It is possible to do it without knowing the number of columns and even without iloc:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]-1]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 2.0
Solution 3:
This may be helpful for beginners, so for the sake of completeness, if you know the column names (e.g. they are in a list), you can use:
column_names = ['Apples', 'Bananas', 'Grapes', 'Kiwis']
df['Fruit Total']= df[column_names].sum(axis=1)
This gives you flexibility about which columns you use as you simply have to manipulate the list column_names
and you can do things like pick only columns with the letter 'a' in their name. Another benefit of this is that it's easier for humans to understand what they are doing through column names. Combine this with list(df.columns)
to get the column names in a list format. Thus, if you want to drop the last column, all you have to do is:
column_names = list(df.columns)
df['Fruit Total']= df[column_names[:-1]].sum(axis=1)
Solution 4:
Using df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
over your original df won't add the last column ('Kiwis'), you should use df.iloc[:, -4:]
instead to select all columns:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
df['Fruit Total']=df.iloc[:,-4:].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
Solution 5:
I want to build on Ramon's answer if you want to come up with the total without knowing the shape/size of the dataframe. I will use his answer below but fix one item that didn't include the last column for the total. I have removed the -1 from the shape:
cols_to_sum = df.columns[ : df.shape[1]-1]
To this:
cols_to_sum = df.columns[ : df.shape[1]]
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
Which then gives you the correct total without skipping the last column.