How to read RegEx Captures in C#
The C# regex API can be quite confusing. There are groups and captures:
- A group represents a capturing group, it's used to extract a substring from the text
- There can be several captures per group, if the group appears inside a quantifier.
The hierarchy is:
- Match
- Group
- Capture
- Group
(a match can have several groups, and each group can have several captures)
For example:
Subject: aabcabbc
Pattern: ^(?:(a+b+)c)+$
In this example, there is only one group: (a+b+)
. This group is inside a quantifier, and is matched twice. It generates two captures: aab
and abb
:
aabcabbc
^^^ ^^^
Cap1 Cap2
When a group is not inside of a quantifier, it generates only one capture. In your case, you have 3 groups, and each group captures once. You can use match.Groups[1].Value
, match.Groups[2].Value
and match.Groups[3].Value
to extract the 3 substrings you're interested in, without resorting to the capture notion at all.
Match results can be complicated to understand. I wrote this code to assist my understanding of what had been found and where. The intention is that pieces of the output (from the lines marked with //**
) can be copied into the program to make use of values found in the match.
public static void DisplayMatchResults(Match match)
{
Console.WriteLine("Match has {0} captures", match.Captures.Count);
int groupNo = 0;
foreach (Group mm in match.Groups)
{
Console.WriteLine(" Group {0,2} has {1,2} captures '{2}'", groupNo, mm.Captures.Count, mm.Value);
int captureNo = 0;
foreach (Capture cc in mm.Captures)
{
Console.WriteLine(" Capture {0,2} '{1}'", captureNo, cc);
captureNo++;
}
groupNo++;
}
groupNo = 0;
foreach (Group mm in match.Groups)
{
Console.WriteLine(" match.Groups[{0}].Value == \"{1}\"", groupNo, match.Groups[groupNo].Value); //**
groupNo++;
}
groupNo = 0;
foreach (Group mm in match.Groups)
{
int captureNo = 0;
foreach (Capture cc in mm.Captures)
{
Console.WriteLine(" match.Groups[{0}].Captures[{1}].Value == \"{2}\"", groupNo, captureNo, match.Groups[groupNo].Captures[captureNo].Value); //**
captureNo++;
}
groupNo++;
}
}
A simple example of using this method, given this input:
Regex regex = new Regex("/([A-Za-z]+)/(\\d+)$");
String text = "some/directory/Pictures/Houses/12/apple/banana/"
+ "cherry/345/damson/elderberry/fig/678/gooseberry");
Match match = regex.Match(text);
DisplayMatchResults(match);
The output is:
Match has 1 captures
Group 0 has 1 captures '/Houses/12'
Capture 0 '/Houses/12'
Group 1 has 1 captures 'Houses'
Capture 0 'Houses'
Group 2 has 1 captures '12'
Capture 0 '12'
match.Groups[0].Value == "/Houses/12"
match.Groups[1].Value == "Houses"
match.Groups[2].Value == "12"
match.Groups[0].Captures[0].Value == "/Houses/12"
match.Groups[1].Captures[0].Value == "Houses"
match.Groups[2].Captures[0].Value == "12"
Suppose that we want to find all matches of the above regular expression in the above text. Then we can use a MatchCollection
in code such as:
MatchCollection matches = regex.Matches(text);
for (int ii = 0; ii < matches.Count; ii++)
{
Console.WriteLine("Match[{0}] // of 0..{1}:", ii, matches.Count-1);
RegexMatchDisplay.DisplayMatchResults(matches[ii]);
}
The output from this is:
Match[0] // of 0..2:
Match has 1 captures
Group 0 has 1 captures '/Houses/12/'
Capture 0 '/Houses/12/'
Group 1 has 1 captures 'Houses'
Capture 0 'Houses'
Group 2 has 1 captures '12'
Capture 0 '12'
match.Groups[0].Value == "/Houses/12/"
match.Groups[1].Value == "Houses"
match.Groups[2].Value == "12"
match.Groups[0].Captures[0].Value == "/Houses/12/"
match.Groups[1].Captures[0].Value == "Houses"
match.Groups[2].Captures[0].Value == "12"
Match[1] // of 0..2:
Match has 1 captures
Group 0 has 1 captures '/cherry/345/'
Capture 0 '/cherry/345/'
Group 1 has 1 captures 'cherry'
Capture 0 'cherry'
Group 2 has 1 captures '345'
Capture 0 '345'
match.Groups[0].Value == "/cherry/345/"
match.Groups[1].Value == "cherry"
match.Groups[2].Value == "345"
match.Groups[0].Captures[0].Value == "/cherry/345/"
match.Groups[1].Captures[0].Value == "cherry"
match.Groups[2].Captures[0].Value == "345"
Match[2] // of 0..2:
Match has 1 captures
Group 0 has 1 captures '/fig/678/'
Capture 0 '/fig/678/'
Group 1 has 1 captures 'fig'
Capture 0 'fig'
Group 2 has 1 captures '678'
Capture 0 '678'
match.Groups[0].Value == "/fig/678/"
match.Groups[1].Value == "fig"
match.Groups[2].Value == "678"
match.Groups[0].Captures[0].Value == "/fig/678/"
match.Groups[1].Captures[0].Value == "fig"
match.Groups[2].Captures[0].Value == "678"
Hence:
matches[1].Groups[0].Value == "/cherry/345/"
matches[1].Groups[1].Value == "cherry"
matches[1].Groups[2].Value == "345"
matches[1].Groups[0].Captures[0].Value == "/cherry/345/"
matches[1].Groups[1].Captures[0].Value == "cherry"
matches[1].Groups[2].Captures[0].Value == "345"
Similarly for matches[0]
and matches[2]
.