Regex lookahead for 'not followed by' in grep
I am attempting to grep for all instances of Ui\.
not followed by Line
or even just the letter L
What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?
Using lookaheads
grep "Ui\.(?!L)" *
bash: !L: event not found
grep "Ui\.(?!(Line))" *
nothing
Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep
. You need a PCRE-enabled grep.
If you have GNU grep
, the current version supports options -P
or --perl-regexp
and you can then use the regex you wanted.
If you don't have (a sufficiently recent version of) GNU grep
, then consider getting ack
.
The answer to part of your problem is here, and ack would behave the same way: Ack & negative lookahead giving errors
You are using double-quotes for grep, which permits bash to "interpret !
as history expand command."
You need to wrap your pattern in SINGLE-QUOTES:
grep 'Ui\.(?!L)' *
However, see @JonathanLeffler's answer to address the issues with negative lookaheads in standard grep
!
You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.
For the regex in question you might do something like
grep 'Ui\.' * | grep -v 'Ui\.L'
If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L]
, alternation |
, and the end of string anchor $
.
In your case grep 'Ui\.\([^L]\|$\)' *
does the job.
Ui\.
matches the string you're interested in\([^L]\|$\)
matches any single character other thanL
or it matches the end of the line:[^L]
or$
.
If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a
not followed by bc
:
grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *
Which is either (a
followed by not b
or followed by the end of the line: a
then [^b]
or $
) or (a
followed by b
which is either followed by not c
or is followed by the end of the line: a
then b
, then [^c]
or $
.
This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.
*If your implementation supports non-capturing groups then you can avoid capturing extra characters.
At least for the case of not wanting an 'L' character after the "Ui." you don't really need PCRE.
grep -E 'Ui\.($|[^L])' *
Here I've made sure to match the special case of the "Ui." at the end of the line.