Search for "does-not-contain" on a DataFrame in pandas

I've done some searching and can't figure out how to filter a dataframe by df["col"].str.contains(word), however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of !(df["col"].str.contains(word)).

Can this be done through a DataFrame method?

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

, where new_df is the copy returned by RHS.

contains also accepts a regular expression...

If the above throws a ValueError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:

df[df["col"].str.contains('this|that')==False]

You can use Apply and Lambda :

df[df["col"].apply(lambda x: word not in x)]

Or if you want to define more complex rule, you can use AND:

df[df["col"].apply(lambda x: word_1 not in x and word_2 not in x)]

I hope the answers are already posted

I am adding the framework to find multiple words and negate those from dataFrame.

Here 'word1','word2','word3','word4' = list of patterns to search

df = DataFrame

column_a = A column name from from DataFrame df

values_to_remove = ['word1','word2','word3','word4'] 

pattern = '|'.join(values_to_remove)

result = df.loc[~df['column_a'].str.contains(pattern, case=False)]

Search for "does-not-contain" on a DataFrame in pandas

Related

Recent Posts