Regular expression to match balanced parentheses
I need a regular expression to select all the text between two outer brackets.
Example: some text(text here(possible text)text(possible text(more text)))end text
Result: (text here(possible text)text(possible text(more text)))
I want to add this answer for quickreference. Feel free to update.
.NET Regex using balancing groups.
\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)
Where c
is used as the depth counter.
Demo at Regexstorm.com
- Stack Overflow: Using RegEx to balance match parenthesis
- Wes' Puzzling Blog: Matching Balanced Constructs with .NET Regular Expressions
- Greg Reinacker's Weblog: Nested Constructs in Regular Expressions
PCRE using a recursive pattern.
\((?:[^)(]+|(?R))*+\)
Demo at regex101; Or without alternation:
\((?:[^)(]*(?R)?)*+\)
Demo at regex101; Or unrolled for performance:
\([^)(]*+(?:(?R)[^)(]*)*+\)
Demo at regex101; The pattern is pasted at (?R)
which represents (?0)
.
Perl, PHP, Notepad++, R: perl=TRUE, Python: Regex package with (?V1)
for Perl behaviour.
Ruby using subexpression calls.
With Ruby 2.0 \g<0>
can be used to call full pattern.
\((?>[^)(]+|\g<0>)*\)
Demo at Rubular; Ruby 1.9 only supports capturing group recursion:
(\((?>[^)(]+|\g<1>)*\))
Demo at Rubular (atomic grouping since Ruby 1.9.3)
JavaScript API :: XRegExp.matchRecursive
XRegExp.matchRecursive(str, '\\(', '\\)', 'g');
JS, Java and other regex flavors without recursion up to 2 levels of nesting:
\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)
Demo at regex101. Deeper nesting needs to be added to pattern.
To fail faster on unbalanced parenthesis drop the +
quantifier.
Java: An interesting idea using forward references by @jaytea.
Reference - What does this regex mean?
- rexegg.com - Recursive Regular Expressions
- Regular-Expressions.info - Regular Expression Recursion
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
But there is a simple algorithm to do this, which I described in more detail in this answer to a previous question. The gist is to write code which scans through the string keeping a counter of the open parentheses which have not yet been matched by a closing parenthesis. When that counter returns to zero, then you know you've reached the final closing parenthesis.