JavaScript regular expressions and sub-matches
Using String
's match()
function won't return captured groups if the global modifier is set, as you found out.
In this case, you would want to use a RegExp
object and call its exec()
function. String
's match()
is almost identical to RegExp
's exec()
function…except in cases like these. If the global modifier is set, the normal match()
function won't return captured groups, while RegExp
's exec()
function will. (Noted here, among other places.)
Another catch to remember is that exec()
doesn't return the matches in one big array—it keeps returning matches until it runs out, in which case it returns null
.
So, for example, you could do something like this:
var pattern = /t(e)(s)t/g; // Alternatively, "new RegExp('t(e)(s)t', 'g');"
var match;
while (match = pattern.exec(text)) {
// Do something with the match (["test", "e", "s"]) here...
}
Another thing to note is that RegExp.prototype.exec()
and RegExp.prototype.test()
execute the regular expression on the provided string and return the first result. Every sequential call will step through the result set updating RegExp.prototype.lastIndex
based on the current position in the string.
Here's an example: // remember there are 4 matches in the example and pattern. lastIndex starts at 0
pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9
pattern.exec(text); // pattern.lastIndex = 14
pattern.exec(text); // pattern.lastIndex = 19
// if we were to call pattern.exec(text) again it would return null and reset the pattern.lastIndex to 0
while (var match = pattern.exec(text)) {
// never gets run because we already traversed the string
console.log(match);
}
pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9
// however we can reset the lastIndex and it will give us the ability to traverse the string from the start again or any specific position in the string
pattern.lastIndex = 0;
while (var match = pattern.exec(text)) {
// outputs all matches
console.log(match);
}
You can find information on how to use RegExp
objects on the MDN (specifically, here's the documentation for the exec()
function).
I am surprised to see that I am the first person to answer this question with the answer I was looking for 10 years ago (the answer did not exist yet). I also was hoping that the actual spec writers would have answered it before me ;).
.matchAll has already been added to a few browsers.
In modern javascript we can now accomplish this by just doing the following.
let result = [...text.matchAll(/t(e)(s)t/g)];
.matchAll spec
.matchAll docs
I now maintain an isomorphic javascript library that helps with a lot of this type of string parsing. You can check it out here: string-saw. It assists in making .matchAll easier to use when using named capture groups.
An example would be
saw(text).matchAll(/t(e)(s)t/g)
Which outputs a more user-friendly array of matches, and if you want to get fancy you can throw in named capture groups and get an array of objects.