Regular expression that doesn't contain certain string [duplicate]
Solution 1:
By the power of Google I found a blogpost from 2007 which gives the following regex that matches string which don't contains a certain substring:
^((?!my string).)*$
It works as follows: it looks for zero or more (*) characters (.) which do not begin (?! - negative lookahead) your string and it stipulates that the entire string must be made up of such characters (by using the ^ and $ anchors). Or to put it an other way:
The entire string must be made up of characters which do not begin a given string, which means that the string doesn't contain the given substring.
Solution 2:
In general it's a pain to write a regular expression not containing a particular string. We had to do this for models of computation - you take an NFA, which is easy enough to define, and then reduce it to a regular expression. The expression for things not containing "cat" was about 80 characters long.
Edit: I just finished and yes, it's:
aa([^a] | a[^a])aa
Here is a very brief tutorial. I found some great ones before, but I can't see them anymore.
Solution 3:
All you need is a reluctant quantifier:
regex: /aa.*?aa/
aabbabcaabda => aabbabcaa
aaaaaabda => aaaa
aabbabcaabda => aabbabcaa
aababaaaabdaa => aababaa, aabdaa
You could use negative lookahead, too, but in this case it's just a more verbose way accomplish the same thing. Also, it's a little trickier than gpojd made it out to be. The lookahead has to be applied at each position before the dot is allowed to consume the next character.
/aa(?:(?!aa).)*aa/
As for the approach suggested by Claudiu and finnw, it'll work okay when the sentinel string is only two characters long, but (as Claudiu acknowledged) it's too unwieldy for longer strings.