Search and replace in bash using regular expressions
I've seen this example:
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}
Which follows this syntax: ${variable//pattern/replacement}
Unfortunately the pattern
field doesn't seem to support full regex syntax (if I use .
or \s
, for example, it tries to match the literal characters).
How can I search/replace a string using full regex syntax?
Solution 1:
Use sed:
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
Note that the subsequent -e
's are processed in order. Also, the g
flag for the expression will match all occurrences in the input.
You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:
echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'
This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and
evaluation). And of course, you have the full language support of Perl to do your bidding...
Solution 2:
This actually can be done in pure bash:
hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"
...yields...
howareyoudoingtodday
Solution 3:
These examples also work in bash no need to use sed:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X}
echo ${MYVAR//[0-9]/N}
you can also use the character class bracket expressions
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X}
echo ${MYVAR//[[:digit:]]/N}
output
XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D
etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.
These don't work:
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'
output with all literal "d" characters removed
ho02123ware38384you44334o3434ingto38384ay
but the following does work as expected
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'
output
howareyoudoingtodday
Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'
On most flavours of *nix you will only see the following output:
d
d
d
nJoy!
Solution 4:
If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.
hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X
P1=$(date +%s)
for i in {1..10000}
do
echo $hello | sed s/X//g > /dev/null
done
P2=$(date +%s)
echo $[$P2-$P1]
for i in {1..10000}
do
echo ${hello//X/} > /dev/null
done
P3=$(date +%s)
echo $[$P3-$P2]
Solution 5:
Use [[:digit:]]
(note the double brackets) as the pattern:
$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday
Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).