Regex matching in a Bash if statement
There are a couple of important things to know about bash's [[ ]]
construction. The first:
Word splitting and pathname expansion are not performed on the words between the
[[
and]]
; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.
The second thing:
An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.
Consequently, $v
on either side of the =~
will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.
So if you write: [[ $x =~ [$0-9a-zA-Z] ]]
, the $0
inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0
ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]]
, then the right-hand side will be treated as an ordinary string, not a regex (and $0
will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]
Similarly, the expression between the [[
and ]]
is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]
. Other characters similarly need to be escaped, like #
, which would start a comment if not quoted. Of course, you can put the pattern into a variable:
pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...
For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:
# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...
Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:
valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...
!
negates the test, turning it into a "does not match" operator, and a [^...]
regex character class means "any character other than ...
".
The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ]
into $valid
, even if $valid
were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ]
in a character class, it needs to go at the beginning. -
can go at the beginning or the end, so if you need both ]
and -
, you need to start with ]
and end with -
, leading to the regex "I know what I'm doing" emoticon: [][-]
)
In case someone wanted an example using variables...
#!/bin/bash
# Only continue for 'develop' or 'release/*' branches
BRANCH_REGEX="^(develop$|release//*)"
if [[ $BRANCH =~ $BRANCH_REGEX ]];
then
echo "BRANCH '$BRANCH' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
echo "BRANCH '$BRANCH' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi
I'd prefer to use [:punct:]
for that. Also, a-zA-Z09-9
could be just [:alnum:]
:
[[ $TEST =~ ^[[:alnum:][:blank:][:punct:]]+$ ]]
Or you might be looking at this question because you happened to make a silly typo like I did and have the =~ reversed to ~=