use of alternation "|" in sed's regex

I am using sed, GNU sed version 4.2.1. I want to use the alternation "|" symbol in a subexpression. For example :

echo "blia blib bou blf" | sed 's/bl\(ia|f\)//g'

should return

" blib bou "

but it returns

"blia blib bou blf".

How can I have the expected result ?


Solution 1:

The "|" also needs a backslash to get its special meaning.

echo "blia blib bou blf" | sed 's/bl\(ia\|f\)//g'

will do what you want.

As you know, if all else fails, read the manual :-).

GNU sed user's manual, section 3.3 Overview of Regular Expression Syntax:

`REGEXP1\|REGEXP2'

Matches either REGEXP1 or REGEXP2.

Note the backslash...

Unfortunately, regex syntax is not really standardized... there are many variants, which differ among other things in which "special characters" need \ and which do not. In some it's even configurable or depends on switches (as in GNU grep, which you can switch between three different regex dialects).

This answer in particular is for GNU sed. There are other sed variants, for example the one used in the BSDs, which behave differently.

Solution 2:

Since there are several comments regarding non-Gnu sed implementations: At least on OS X, you can use the -E argument to sed:

Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE's). The re_format(7) manual page fully describes both formats.

Then you can use regular expression metacharacters without escaping them. Example:

$ echo "blia blib bou blf" | sed -E 's/bl(ia|f)//g'
 blib bou 

Solution 3:

GNU sed also supports the -r option (extended regular expressions). This means you don't have to escape the metacharacters:

echo foohello barhello | sed -re "s/(foo|bar)hello/hi/g"

Output:

hi hi

Solution 4:

The \| does not work with sed on Solaris 10 either. What I did was use

perl -p -e 's/bl(ia|f)//g'