Pandas: How do I assign values based on multiple conditions for existing columns?
I would like to create a new column with a numerical value based on the following conditions:
a. if gender is male & pet1=pet2, points = 5
b. if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5
c. all other combinations, points = 0
gender pet1 pet2
0 male dog dog
1 male cat cat
2 male dog cat
3 female cat squirrel
4 female dog dog
5 female squirrel cat
6 squirrel dog cat
I would like the end result to be as follows:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
How do I accomplish this?
Solution 1:
You can do this using np.where
, the conditions use bitwise &
and |
for and
and or
with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5
is returned and 0
otherwise:
In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df
Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Solution 2:
numpy.select
This is a perfect case for np.select
where we can create a column based on multiple conditions and it's a readable method when there are more conditions:
conditions = [
df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]
choices = [5,5]
df['points'] = np.select(conditions, choices, default=0)
print(df)
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Solution 3:
using apply.
def f(x):
if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
else: return 0
data['points'] = data.apply(f, axis=1)
Solution 4:
The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like
df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
((df.a != 0) & (df.b == 1)).astype(int) * 5
where the apply method took 25 seconds and this method above took about 18ms.
Solution 5:
You can also use the apply
function. For example:
def myfunc(gender, pet1, pet2):
if gender=='male' and pet1==pet2:
myvalue=5
elif gender=='female' and (pet1=='cat' or pet1=='dog'):
myvalue=5
else:
myvalue=0
return myvalue
And then using the apply function by setting axis=1
df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
We get:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0