Iterating over pandas df returning all values not matchin a regex

I am trying to iterate over a columns to identify non-valid entries. This works

weirdos = df.loc[df[column] == '7282'][['col1', 'col2']]

but trying the same with regex like

regex = "^[a-zA-Z]{2}[*]{1}[a-zA-Z0-9]{3}[*]{1}[a-zA-Z0-9*]{0,30}$"
weirdos = df.loc[re.search(regex, df[column]) is not None][['col1', 'col2']]

keeps getting the error TypeError: expected string or bytes-like object. Any hints?


Assuming column (which is not enclosed in a pair of quotes ') is a string variable containing the column name to check, use:

weirdos = df.loc[~df[column].str.contains(regex)][['col1', 'col2']]

Note that you have to use str.contains() instead of str.match() in order to adhere to your original code using re.search(). This is because str.contains() underlying uses re.search() while str.match() uses re.match() which search for matches at the beginning of text only.

The ~ is added in the filtering condition because of your question title mentioning NOT matching a regex You can remove it if you intend for matching instead.

One reminder is to define the regex under raw string, i.e. regex = r'....' so that you don't need to escape each regex symbol.

Test Run

data = {'col_0': ['baa', 'bbc', 'ccd'], 'col1': [10, 20, 30], 'col2': [100, 200, 300]}
df = pd.DataFrame(data)
print(df)
Output:
    col_0   col1    col2
0   baa       10     100
1   bbc       20     200
2   ccd       30     300

regex = r'aa'           # containing 'aa' anywhere in string
column = 'col_0'

weirdos = df.loc[~df[column].str.contains(regex)][['col1', 'col2']]     # filtering those NOT containing 'aa' anywhere in string
print(weirdos)

Output:
    col1    col2
1   20       200
2   30       300