What is the meaning of + in a regex?
What does the plus symbol in regex mean?
+
can actually have two meanings, depending on context.
Like the other answers mentioned, +
usually is a repetition operator, and causes the preceding token to repeat one or more times. a+
would be expressed as aa*
in formal language theory, and could also be expressed as a{1,}
(match a minimum of 1 times and a maximum of infinite times).
However, +
can also make other quantifiers possessive if it follows a repetition operator (ie ?+
, *+
, ++
or {m,n}+
). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.
To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .*
(the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is aaaaaaaab
. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.
However, let's say we change the pattern to .*b
. Now, when the regex engine tries to match against aaaaaaaab
, the .*
will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .*
consumed everything but the pattern still has to match b
afterwards), it will backtrack, one character at a time, and try to match b
. The first backtrack will make the .*
consume aaaaaaaa
, and then b
can consume b
, and the pattern succeeds.
Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b
(match any character zero or more times, possessively, followed by a b
), and try to match aaaaaaaab
, again the .*
will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.
1 In most engines, the dot will not match a newline character, unless the /s
("singleline" or "dotall") modifier is specified.
In most implementations +
means "one or more".
In some theoretical writings +
is used to mean "or" (most implementations use the |
symbol for that).
1 or more of previous expression.
[0-9]+
Would match:
1234567890
In:
I have 1234567890 dollars