Filter pandas DataFrame by substring criteria

I have a pandas DataFrame with a column of string values. I need to select rows based on partial string matches.

Something like this idiom:

re.search(pattern, cell_in_question) 

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can't seem to find a way to do the same with a partial string match, say 'hello'.


Solution 1:

Based on github issue #620, it looks like you'll soon be able to do the following:

df[df['A'].str.contains("hello")]

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.

Solution 2:

I am using pandas 0.14.1 on macos in ipython notebook. I tried the proposed line above:

df[df["A"].str.contains("Hello|Britain")]

and got an error:

cannot index with vector containing NA / NaN values

but it worked perfectly when an "==True" condition was added, like this:

df[df['A'].str.contains("Hello|Britain")==True]