Regex lazy quantifier behave greedy

Solution 1:

The \[.*?\]\[2\] pattern works like this:

  • \[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
  • .*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
  • \]\[2\] - ][2] substring.

So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.

Solution

Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.

\[[^\]\[]*\]\[2\]

See this regex demo.

Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.

Other examples:

  • <[^<>]*> matches <...> with no < and > inside
  • \([^()]*\) matches (...) with no ( and ) inside
  • "[^"]*" matches "..." with no " inside

In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.