Conditional Logic on Pandas DataFrame
How to apply conditional logic to a Pandas DataFrame.
See DataFrame shown below,
data desired_output
0 1 False
1 2 False
2 3 True
3 4 True
My original data is show in the 'data' column and the desired_output is shown next to it. If the number in 'data' is below 2.5, the desired_output is False.
I could apply a loop and do re-construct the DataFrame... but that would be 'un-pythonic'
In [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
Just compare the column with that value:
In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])
In [10]: df
Out[10]:
data
0 1
1 2
2 3
3 4
In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]:
data desired
0 1 False
1 2 False
2 3 True
3 4 True
In [34]: import pandas as pd
In [35]: import numpy as np
In [36]: df = pd.DataFrame([1,2,3,4], columns=["data"])
In [37]: df
Out[37]:
data
0 1
1 2
2 3
3 4
In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")
In [39]: df
Out[39]:
data desired_output
0 1 False
1 2 False
2 3 True
3 4 True
In this specific example, where the DataFrame is only one column, you can write this elegantly as:
df['desired_output'] = df.le(2.5)
le
tests whether elements are less than or equal 2.5, similarly lt
for less than, gt
and ge
.