How can I "inverse match" with regex?

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?

I'm using RegexBuddy, but still having trouble.

(?!Andrea).{6}

Assuming your regexp engine supports negative lookaheads...

...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}

Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.

For Python/Java,

^(.(?!(some text)))*$

http://www.lisnichenko.com/articles/javapython-inverse-regex.html

In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:

^(?:(?!Andrea).)*$

This is called a tempered greedy token. The downside is that it doesn't perform well.

The capabilities and syntax of the regex implementation matter.

You could use look-ahead. Using Python as an example,

import re

not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)

To break that down:

(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then

\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]

\w{6} means exactly six word characters.

re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...

Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.

How can I "inverse match" with regex?

Related

Recent Posts