grep - exclude string which is not a substring of a string

I explain my problem on Ubuntu 16.04 with the following example: The file is:

# cat file
aaa
aaaxxx
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx

I want to display all lines which contain aaa but not in the only combination of aaaxxx. I want an output like this:

# grep SOMETHING-HERE file …
aaa
aaaxxx*aaa (second aaa is the hit)
aaa=aaaxxx (first aaa is the hit)
bbbaaaccc (aaa in any other combination but not aaaxxx)
aaaddd/aaaxxx (similar to above)

I tried things like grep -v aaaxxx file | grep aaa which results:

aaa
bbbaaaccc

or

# egrep -P '(?<!aaaxxx )aaa' file
grep: die angegebenen Suchmuster stehen in Konflikt zueinander (the pattern are in contradiction)

Is there any (simple) possibility? Of course it doesn’t need to be grep. Thanks


Solution 1:

It's straightforward using a perl-style lookahead operator - available in grep's Perl Compatible Regular Expression (PCRE) mode using the -P switch:

$ grep -P 'aaa(?!xxx)' file
aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx

(bold formatting in the output indicates the matched parts highlighted by grep)


Although the zero-length lookahead is convenient, you could achieve the same output using GNU Extended Regular Expression (ERE) syntax, for example by matching aaa followed by up to 2 x characters followed by a non-x character or end-of-line i.e.

grep -E 'aaax{0,2}([^x]|$)' file

or even using GNU basic regular expression (BRE) syntax

grep 'aaax\{0,2\}\([^x]\|$\)' file

which match as

aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx