How to drop rows from pandas data frame that contains a particular string in a particular column? [duplicate]

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat

If your string constraint is not just one string you can drop those corresponding rows with:

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list

This will only work if you want to compare exact strings. It will not work in case you want to check if the column string contains any of the strings in the list.

The right way to compare with a list would be :

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]

Slight modification to the code. Having na=False will skip empty values. Otherwise you can get an error TypeError: bad operand type for unary ~: float

df[~df.C.str.contains("XYZ", na=False)]

Source: TypeError: bad operand type for unary ~: float

new_df = df[df.C != 'XYZ']

Reference: https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/

How to drop rows from pandas data frame that contains a particular string in a particular column? [duplicate]

Related

Recent Posts