Regex for checking if a string has mismatched parentheses?

In a PHP script, what regex should I use to check for mismatched parentheses in a string? Things that I want to allow include:

  • This is (ok)
  • This (is) (ok)

Things I want to prevent:

  • This is )bad(
  • This is also (bad
  • This is (bad (too)

Thanks!

Update: You guys all rock. Doing this with a regex seemed trickier than it should have, and these kinds of 2nd level answers are what makes stackoverflow beautiful. Thanks for the links and the pseudocode. I'm not sure who to give the answer to, so I apologize to everyone whose answers I can't accept.


Solution 1:

Regex is not the right tool for the job. Scan a string manually.

Pseudo-code:

depth = 0
for character in some_string:
    depth += character == '('
    depth -= character == ')'
    if depth < 0:
       break

if depth != 0:
   print "unmatched parentheses"

Solution 2:

You can do this with a regular expression -- PCRE, as used by PHP, allows recursive patterns. The PHP Manual gives an example that is almost exactly what you want:

\(((?>[^()]+)|(?R))*\)

This matches any correctly parenthesised substring as long as it begins and ends with parentheses. If you want to ensure the entire string is balanced, allowing strings like "wiggedy(wiggedy)(wiggedy(wack))", here's what I came up with:

^((?:[^()]|\((?1)\))*+)$

Here's an explanation of the pattern, which may be more illuminating than obfuscatory:

^             Beginning of the string
(             Start the "balanced substring" group (to be called recursively)
  (?:         Start the "minimal balanced substring" group
    [^()]     Minimal balanced substring is either a non-paren character
    |         or
    \((?1)\)  a set of parens containing a balanced substring
  )           Finish the "minimal balanced substring" group
  *           Our balanced substring is a maximal sequence of minimal
              balanced substrings
  +           Don't backtrack once we've matched a maximal sequence
)             Finish the "balanced substring" pattern
$             End of the string

There are lots of considerations of efficiency and correctness that come up with these sorts of regexes. Be careful.

Solution 3:

It is not possible to accomplish this with a regex. Brace matching requires a recursive/counting feature that is not available in a regex. You'll need a parser for this.

More details available here: http://blogs.msdn.com/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx

Solution 4:

Your examples don't include any nested parentheses… if you aren't concerned with nesting, then this can be done using the following expression:

^[^()]*(?:\([^()]*\)[^()]*)*$

This will match against all the strings in your "allow" list and fail against the strings in your "prevent" list. However, it will also fail against any string with nested parentheses. e.g. "this (is (not) ok)"

As others have already pointed out, regular expressions are not the correct tool if you need to handle nesting.

Solution 5:

Agree with the fact that this is impossible with a REGEX. You could do the following, though:

<?php

$testStrings = array( 'This is (ok)', 'This (is) (ok)', 'This is )bad(', 'This is also (bad', 'This is (bad (too)' );

foreach( $testStrings as $string ) {
    $passed = hasMatchedParentheses( $string ) ? 'passed' : 'did not pass';
    echo "The string $string $passed the check for matching parenthesis.\n";
}

function hasMatchedParentheses( $string ) {
    $counter = 0;
    $length = strlen( $string );
    for( $i = 0; $i < $length; $i ++ ) {
        $char = $string[ $i ];
        if( $char == '(' ) {
            $counter ++;
        } elseif( $char == ')' ) {
            $counter --;
        }
        if( $counter < 0 ) {
            return false;
        }
    }
    return $counter == 0;
}

?>