Regex only capturing last instance of capture group in match
Solution 1:
Regardless of the problem, ActionScript and JavaScript should always yield the same results, as they both implement ECMAScript (or a superset thereof, but for regular expressions they should not disagree).
But yes, this will be happening in any language (or rather any regex flavor). The reason is that you are repeating the capturing group. Let's take a simpler example: match (.)*
against abc
. So what we are repeating is (.)
. The first time it is tried, the engine enters the group, matches a
with .
, leaves the group and captures a
. Only now does the quantifier kick in and it repeats the whole thing. So we enter the group again, and match and capture b
. This capture overwrites the previous one, hence \1
does now contain b
. Same again for the third repetition: the capture will be overwritten with with c
.
I don't know of a regex flavor that behaves differently, and the only one that lets you access all previous captures (instead of just overwriting them) is .NET.
The solution is the one p.s.w.g. proposed. Make the grouping you need for the repetition non-capturing (this will improve performance, because you don't need all that capturing and overwriting anyway) and wrap the whole thing in a new group. Your expression has one little flaw though: you need to include include the backslash in the negated character class. Otherwise, backtracking could give you a match in [abc\]
. So here is an expression that will work as you expect:
\[((?:\\{2}|\\\]|[^\]\\])*)\]
Working demo. (unfortunately, it doesn't show the captures, but it shows that it gives correct matches in all cases)
Note that your expression does not allow for other escape sequences. In particular a single \
, followed by anything but a ]
will cause your pattern to fail. If this is not what you desire, you can just use:
\[((?:\\.|[^\]\\])*)\]
Working demo.
Performance can further be improved with the "unrolling-the-loop" technique:
\[([^\]\\]*(?:\\.[^\]\\]*)*)\]
Working demo.
Solution 2:
Try including the *
quantifier inside the capture group, like this:
\[((?:\\{2}|\\\]|[^\]])*)\]