Is grep syntax different from regex?

I want to extract name of removed packages from here "cat /var/log/dpkg.log | grep 'remove'"

 2013-09-09 15:57:34 remove activity-log-manager:i386 0.9.4-0ubuntu6.2 <none>
 2013-09-09 15:57:35 remove activity-log-manager-control-center:i386 0.9.4-0ubuntu6.2 <none>
 2013-09-09 15:57:38 remove alacarte:all 3.6.1-0ubuntu3 <none>
 2013-09-09 15:57:41 remove deepin-software-center:all 2.1.2.1~precise~NoobsLab.com <none>

I want to grab only name of the packages between remove and colon after package name. I am not a regex expert, I made a regex expression that seems to do the job but when I want to apply it with grep nothing happens. Here is working regex pattern in regex evaluators

(?<=remove)(.*?)(?=:)

But this is not working :

cat /var/log/dpkg.log | grep 'remove' | grep '(?<=remove)(.*?)(?=:)'

What am I missing here?

There's a common core of regular expression syntax however there are distinct flavors. Your expression appears to contain some features specific to the perl flavor, in particular the use of complex lookaround assertions describing the start and end of the pattern to be matched, whereas grep defaults to a basic regular expression (BRE) syntax that only supports a simpler set of these zero-length matches such as line- (^,$) and word-anchors (\>, \<).

You can enable perl-compatible regular expression (PCRE) support in grep using the -P command line switch (although note that the man page currently describes it as "experimental"). In your case you probably want the -o switch as well to only print the matching pattern, rather than the whole line i.e.

cat /var/log/dpkg.log | grep 'remove' | grep -oP '(?<=remove)(.*?)(?=:)'

Be aware that this expression may fail if it encounters packages that do not have the :i386 suffix since it may read ahead to a matching colon in the next word, e.g.

echo "2013-09-07 08:31:44 remove cifs-utils 2:5.1-1ubuntu2 <none>" | grep -oP '(?<=remove)(.*?)(?=:)'
 cifs-utils 2

You may wish to look at awk instead e.g.

cat /var/log/dpkg.log | awk '$3 ~ /remove/ {sub(":.*", "", $4); print $4}'

As well as BRE and PCRE, Gnu grep has a further mode called extended regular expression (ERE), specified by the -E command line switch. The man page notes that

In  GNU grep,  there is  no difference in available functionality 
between basic and extended syntaxes.

However you should note that "no difference in available functionality" does not mean that the syntax is the same. For example, in BRE the + character is normally treated as literal, and only becomes a modifier meaning 'one or more instance of the preceding regular expression' if escaped, i.e.

$ echo "123.456" | grep '[0-9]+\.[0-9]+'
$ echo "123.456" | grep '[0-9]\+\.[0-9]\+'
123.456

whereas for ERE it is exactly the opposite

$ echo "123.456" | grep -E '[0-9]+\.[0-9]+'
123.456
$ echo "123.456" | grep -E '[0-9]\+\.[0-9]\+'

A similar distinction applies for sed invoked without and with the -r switch.

From the grep man page:

grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.

As far as I know, grep has no ability to edit the lines that it matches; I would use sed or possibly tr for that. Any of the following should get what you want:

cat /var/log/dpkg.log | grep 'remove' | sed 's/.*remove \([^:]*\):.*/\1/'
cat /var/log/dpkg.log | grep 'remove' | sed -E 's/.*remove ([^:]*):.*/\1/'
cat /var/log/dpkg.log | sed -n '/remove/s/.*remove \([^:]*:\).*/\1/p'
cat /var/log/dpkg.log | sed -nE '/remove/s/.*remove ([^:]*):.*/\1/p'

I'm honestly not sure what your (?<=remove)(.*?)(?=:) is trying to accomplish. In regex, brackets are used to define capture groups: you can see that I've used them in the sed commands here -- in there, everything matched will be replaced by the contents of the capture group /1, the first group to be defined.

Is grep syntax different from regex?

Related

Recent Posts