How can I output only captured groups with sed?
Solution 1:
The key to getting this to work is to tell sed
to exclude what you don't want to be output as well as specifying what you do want.
string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
This says:
- don't default to printing each line (
-n
) - exclude zero or more non-digits
- include one or more digits
- exclude one or more non-digits
- include one or more digits
- exclude zero or more non-digits
- print the substitution (
p
)
In general, in sed
you capture groups using parentheses and output what you capture using a back reference:
echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'
will output "bar". If you use -r
(-E
for OS X) for extended regex, you don't need to escape the parentheses:
echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'
There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:
echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'
outputs "a bar a".
If you have GNU grep
(it may also work in BSD, including OS X):
echo "$string" | grep -Po '\d+'
or variations such as:
echo "$string" | grep -Po '(?<=\D )(\d+)'
The -P
option enables Perl Compatible Regular Expressions. See man 3 pcrepattern
or man
3 pcresyntax
.
Solution 2:
Sed has up to nine remembered patterns but you need to use escaped parentheses to remember portions of the regular expression.
See here for examples and more detail