Regex lazy quantifier behave greedy
Solution 1:
The \[.*?\]\[2\]
pattern works like this:
-
\[
- finds the leftmost[
(as the regex engine processes the string input from left to right) -
.*?
- matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below -
\]\[2\]
-][2]
substring.
So, the .*?
gets expanded upon each failure until it finds the leftmost ][2]
. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*?
(or .*
) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*?
is replaced with [^\]\[]*
- 0 or more chars other than ]
and [
.
Other examples:
-
<[^<>]*>
matches<...>
with no<
and>
inside -
\([^()]*\)
matches(...)
with no(
and)
inside -
"[^"]*"
matches"..."
with no"
inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?
. To match abc 1 def
in abc 0 abc 1 def
, use abc(?:(?!abc).)*?def
.