Negative regex for Perl string pattern match
I have this regex:
if($string =~ m/^(Clinton|[^Bush]|Reagan)/i)
{print "$string\n"};
I want to match with Clinton and Reagan, but not Bush.
It's not working.
Your regex does not work because [] defines a character class, but what you want is a lookahead:
(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?<!) - Negative look behind assertion (?<!foo)bar matches bar when NOT preceded by foo
(?>) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)
So try: (?!bush)
Sample text:
Clinton said
Bush used crayons
Reagan forgot
Just omitting a Bush match:
$ perl -ne 'print if /^(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot
Or if you really want to specify:
$ perl -ne 'print if /^(?!Bush)(Clinton|Reagan)/' textfile
Clinton said
Reagan forgot
Your regex says the following:
/^ - if the line starts with
( - start a capture group
Clinton| - "Clinton"
| - or
[^Bush] - Any single character except "B", "u", "s" or "h"
| - or
Reagan) - "Reagan". End capture group.
/i - Make matches case-insensitive
So, in other words, your middle part of the regex is screwing you up. As it is a "catch-all" kind of group, it will allow any line that does not begin with any of the upper or lower case letters in "Bush". For example, these lines would match your regex:
Our president, George Bush
In the news today, pigs can fly
012-3123 33
You either make a negative look-ahead, as suggested earlier, or you simply make two regexes:
if( ($string =~ m/^(Clinton|Reagan)/i) and
($string !~ m/^Bush/i) ) {
print "$string\n";
}
As mirod has pointed out in the comments, the second check is quite unnecessary when using the caret (^
) to match only beginning of lines, as lines that begin with "Clinton" or "Reagan" could never begin with "Bush".
However, it would be valid without the carets.
What's wrong with using two regexs (or three)? This makes your intentions more clear and may even improve your performance:
if ($string =~ /^(Clinton|Reagan)/i && $string !~ /Bush/i) { ... }
if (($string =~ /^Clinton/i || $string =~ /^Reagan/i)
&& $string !~ /Bush/i) {
print "$string\n"
}