grep - exclude string which is not a substring of a string
I explain my problem on Ubuntu 16.04 with the following example: The file is:
# cat file
aaa
aaaxxx
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx
I want to display all lines which contain aaa
but not in the only combination of aaaxxx
. I want an output like this:
# grep SOMETHING-HERE file …
aaa
aaaxxx*aaa (second aaa is the hit)
aaa=aaaxxx (first aaa is the hit)
bbbaaaccc (aaa in any other combination but not aaaxxx)
aaaddd/aaaxxx (similar to above)
I tried things like grep -v aaaxxx file | grep aaa
which results:
aaa
bbbaaaccc
or
# egrep -P '(?<!aaaxxx )aaa' file
grep: die angegebenen Suchmuster stehen in Konflikt zueinander (the pattern are in contradiction)
Is there any (simple) possibility? Of course it doesn’t need to be grep
.
Thanks
Solution 1:
It's straightforward using a perl-style lookahead operator - available in grep's Perl Compatible Regular Expression (PCRE) mode using the -P
switch:
$ grep -P 'aaa(?!xxx)' file
aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx
(bold formatting in the output indicates the matched parts highlighted by grep
)
Although the zero-length lookahead is convenient, you could achieve the same output using GNU Extended Regular Expression (ERE) syntax, for example by matching aaa
followed by up to 2 x
characters followed by a non-x
character or end-of-line i.e.
grep -E 'aaax{0,2}([^x]|$)' file
or even using GNU basic regular expression (BRE) syntax
grep 'aaax\{0,2\}\([^x]\|$\)' file
which match as
aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx