Matching math expression with regular expression?

Use a pushdown automaton for matching paranthesis http://en.wikipedia.org/wiki/Pushdown_automaton (or just a stack ;-) )

Details for the stack solution:

while (chr available)
    if chr == '(' then
      push '('
    else
      if chr == ')' then
        if stack.elements == 0 then
          print('too many or misplaced )')
          exit
        else
          pop //from stack
end while
if (stack.elements != 0)
  print('too many or misplaced(')

Even simple: just keep a counter instead of stack.


Regular expressions can only be used to recognize regular languages. The language of mathematical expressions is not regular; you'll need to implement an actual parser (e.g. LR) in order to do this.


Matching parens with a regex is quite possible.

Here is a Perl script that will parse arbitrary deep matching parens. While it will throw out the non-matching parens outside, I did not design it specifically to validate parens. It will parse arbitrarily deep parens so long as they are balanced. This will get you started however.

The key is recursion both in the regex and the use of it. Play with it, and I am sure that you can get this to also flag non matching prens. I think if you capture what this regex throws away and count parens (ie test for odd parens in the non-match text), you have invalid, unbalanced parens.

#!/usr/bin/perl
$re = qr  /
     (                      # start capture buffer 1
        \(                  #   match an opening paren
        (                   # capture buffer 2
        (?:                 #   match one of:
            (?>             #     don't backtrack over the inside of this group
                [^()]+    #       one or more 
            )               #     end non backtracking group
        |                   #     ... or ...
            (?1)            #     recurse to opening 1 and try it again
        )*                  #   0 or more times.
        )                   # end of buffer 2
        \)                  #   match a closing paren
     )                      # end capture buffer one
    /x;


sub strip {
    my ($str) = @_;
    while ($str=~/$re/g) {
        $match=$1; $striped=$2;
        print "$match\n";
        strip($striped) if $striped=~/\(/;
        return $striped;
    }
}

while(<DATA>) {
    print "start pattern: $_";
    while (/$re/g) { 
        strip($1) ;
    }
}   

__DATA__
"(apple + (-0.5)) * (boy - 1)"
"((((one)two)three)four)x(one(two(three(four))))"
"a) * (b + c) / (d"
"-a * (b / 1.50)"

Output:

start pattern: "(apple + (-0.5)) * (boy - 1)"
(apple + (-0.5))
(-0.5)
(boy - 1)
start pattern: "((((one)two)three)four)x(one(two(three(four))))"
((((one)two)three)four)
(((one)two)three)
((one)two)
(one)
(one(two(three(four))))
(two(three(four)))
(three(four))
(four)
start pattern: "a) * (b + c) / (d"
(b + c)
start pattern: "-a * (b / 1.50)"
(b / 1.50)

I believe you will be better off implementing a real parser to accomplish what you're after.

A parser for simple mathematical expressions is "Parsing 101", and there are several examples to be found online.

Some examples include:

  • ANTLR: Expression Evaluator Sample (ANTLR grammars can target several languages)
  • pyparsing: http://pyparsing.wikispaces.com/file/view/fourFn.py (pyparsing is a Python library)
  • Lex & Yacc: http://epaperpress.com/lexandyacc/ (contains a PDF tutorial and sample code for a calculator)

Note that the grammar you will need for validating expressions is simpler than the examples above, since the examples also implement evaluation of the expression.