How to exclude a specific string constant? [duplicate]
Can regular expression be utilized to match any string except a specific string constant (i.e. "ABC"
)?
Is it possible to exclude just one specific string constant?
You have to use a negative lookahead assertion.
(?!^ABC$)
You could for example use the following.
(?!^ABC$)(^.*$)
If this does not work in your editor, try this. It is tested to work in ruby and javascript:
^((?!ABC).)*$
This isn't easy, unless your regexp engine has special support for it. The easiest way would be to use a negative-match option, for example:
$var !~ /^foo$/
or die "too much foo";
If not, you have to do something evil:
$var =~ /^(($)|([^f].*)|(f[^o].*)|(fo[^o].*)|(foo.+))$/
or die "too much foo";
That one basically says "if it starts with non-f
, the rest can be anything; if it starts with f
, non-o
, the rest can be anything; otherwise, if it starts fo
, the next character had better not be another o
".
In .NET you can use grouping to your advantage like this:
http://regexhero.net/tester/?id=65b32601-2326-4ece-912b-6dcefd883f31
You'll notice that:
(ABC)|(.)
Will grab everything except ABC in the 2nd group. Parenthesis surround each group. So (ABC) is group 1 and (.) is group 2.
So you just grab the 2nd group like this in a replace:
$2
Or in .NET look at the Groups collection inside the Regex class for a little more control.
You should be able to do something similar in most other regex implementations as well.
UPDATE: I found a much faster way to do this here: http://regexhero.net/tester/?id=997ce4a2-878c-41f2-9d28-34e0c5080e03
It still uses grouping (I can't find a way that doesn't use grouping). But this method is over 10X faster than the first.
You could use negative lookahead, or something like this:
^([^A]|A([^B]|B([^C]|$)|$)|$).*$
Maybe it could be simplified a bit.
Try this regular expression:
^(.{0,2}|([^A]..|A[^B].|AB[^C])|.{4,})$
It describes three cases:
- less than three arbitrary character
- exactly three characters, while either
- the first is not
A
, or - the first is
A
but the second is notB
, or - the first is
A
, the secondB
but the third is notC
- the first is not
- more than three arbitrary characters