Using RegEx to balance match parenthesis

Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.

Example:

var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        (?:                 
        [^()]               # Match all non-braces
        |
        (?<open> \( )       # Match '(', and capture into 'open'
        |
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        )+
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

Microsoft's documentation was pretty helpful too.


Using balanced groups, it is:

Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".

If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)


Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.