Regular Expression Groups in C#
I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.
var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
...
For the input user == "Josh Smith [jsmith]"
:
matches.Count == 1
matches[0].Value == "[jsmith]"
... which I understand. But then:
matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?
Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?
Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?
Solution 1:
-
match.Groups[0]
is always the same asmatch.Value
, which is the entire match. -
match.Groups[1]
is the first capturing group in your regular expression.
Consider this example:
var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);
In this case,
-
match.Value
is"[john] John Johnson"
-
match.Groups[0]
is always the same asmatch.Value
,"[john] John Johnson"
. -
match.Groups[1]
is the group of captures from the(.*?)
. -
match.Groups[2]
is the group of captures from the(.*)
. -
match.Groups[1].Captures
is yet another dimension.
Consider another example:
var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);
Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures
!
-
match.Groups[0]
is always the same asmatch.Value
,"[john][johnny]"
. -
match.Groups[1]
is the group of captures from the(\[.*?\])+
. The same asmatch.Value
in this case. -
match.Groups[1].Captures[0]
is the same asmatch.Groups[1].Value
-
match.Groups[1].Captures[1]
is[john]
-
match.Groups[1].Captures[2]
is[johnny]
Solution 2:
The ( )
acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( )
.
Solution 3:
Groups[0]
is your entire input string.
Groups[1]
is your group captured by parentheses (.*?)
. You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?)
to create a non-capturing group.
Solution 4:
The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.
Solution 5:
How? The answer is here
(.*?)
That is a subgroup of @"[(.*?)];