How to implement a Boolean search with multiple columns in pandas

I have a pandas df and would like to accomplish something along these lines (in SQL terms):

SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.

Now this works, for one column/value pair:

foo = df.loc[df['column']==value]

However, I'm not sure how to expand that to multiple column/value pairs.

  • To be clear, each column matches a different value.

Solution 1:

You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and (&) and or (|) operators:

foo = df[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]

If you use and or or, then pandas is likely to moan that the comparison is ambiguous. In that case, it is unclear whether we are comparing every value in a series in the condition, and what does it mean if only 1 or all but 1 match the condition. That is why you should use the bitwise operators or the numpy np.all or np.any to specify the matching criteria.

There is also the query method: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html

but there are some limitations mainly to do with issues where there could be ambiguity between column names and index values.

Solution 2:

A more concise--but not necessarily faster--method is to use DataFrame.isin() and DataFrame.any()

In [27]: n = 10

In [28]: df = DataFrame(randint(4, size=(n, 2)), columns=list('ab'))

In [29]: df
Out[29]:
   a  b
0  0  0
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
7  3  0
8  1  1
9  2  2

[10 rows x 2 columns]

In [30]: df.isin([1, 2])
Out[30]:
       a      b
0  False  False
1   True   True
2   True   True
3   True  False
4   True  False
5  False   True
6   True   True
7  False  False
8   True   True
9   True   True

[10 rows x 2 columns]

In [31]: df.isin([1, 2]).any(1)
Out[31]:
0    False
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8     True
9     True
dtype: bool

In [32]: df.loc[df.isin([1, 2]).any(1)]
Out[32]:
   a  b
1  1  1
2  1  1
3  2  3
4  2  3
5  0  2
6  1  2
8  1  1
9  2  2

[8 rows x 2 columns]