Compare two columns using pandas
Using this as a starting point:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
I want to use something like an if
statement within pandas.
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
Basically, check each row via the if
statement, create new column.
The docs say to use .all
but there is no example...
Solution 1:
You could use np.where. If cond
is a boolean array, and A
and B
are arrays, then
C = np.where(cond, A, B)
defines C to be equal to A
where cond
is True, and B
where cond
is False.
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
yields
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you have more than one condition, then you could use np.select instead.
For example, if you wish df['que']
to equal df['two']
when df['one'] < df['two']
, then
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
yields
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
If we can assume that df['one'] >= df['two']
when df['one'] < df['two']
is
False, then the conditions and choices could be simplified to
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
(The assumption may not be true if df['one']
or df['two']
contain NaNs.)
Note that
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:
df2 = df.astype(float)
This changes the results, however, since strings compare character-by-character, while floats are compared numerically.
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
Solution 2:
You can use .equals
for columns or entire dataframes.
df['col1'].equals(df['col2'])
If they're equal, that statement will return True
, else False
.