Subtract two columns in dataframe
My df looks as follows:
Index Country Val1 Val2 ... Val10
1 Australia 1 3 ... 5
2 Bambua 12 33 ... 56
3 Tambua 14 34 ... 58
I'd like to substract Val10 from Val1 for each country, so output looks like:
Country Val10-Val1
Australia 4
Bambua 23
Tambua 24
So far I've got:
def myDelta(row):
data = row[['Val10', 'Val1']]
return pd.Series({'Delta': np.subtract(data)})
def runDeltas():
myDF = getDF() \
.apply(myDelta, axis=1) \
.sort_values(by=['Delta'], ascending=False)
return myDF
runDeltas results in this error:
ValueError: ('invalid number of arguments', u'occurred at index 9')
What's the proper way to fix this?
Solution 1:
Given the following dataframe:
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
It comes down to a simple broadcasting operation:
>>> df["Val1"] - df["Val10"]
0 -4
1 -44
2 -44
dtype: int64
Solution 2:
Using this as the df:
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
You can also do the subtraction and put it into a new column as follows.
>>>df['Val_Diff'] = df['Val10'] - df['Val1']
Country Val1 Val2 Val10 Val_Diff
0 Australia 1 3 5 4
1 Bambua 12 33 56 44
2 Tambua 14 34 58 44
Solution 3:
You can do this by using lambda function and assign to new column.
df['Val10-Val1'] = df.apply(lambda x: x['Val10'] - x['Val1'], axis=1)
print df