Regular expression matching

I want to write a regular expression which matches anything between

()
(())
(()())
((()))
()()()

etc.

Solution 1:

All these answers claiming you can't use patterns to match a string with balanced nested parens are quite wrong. It's not practical to pretend that the patterns matched by modern programming languages are restricted to "regular languages" in the pathological textbook sense. As soon as you permit backreferences, they're not. This allows real-world patterns to match much more than the textbook versions, making them far more practical.

The simplest pattern for matching balanced parens is \((?:[^()]*+|(?0))*\). But you should never write that, because it is too compact to be easily read. You should always write it with /x mode to allow for whitespace and comments. So write it like this:

m{
  \(              # literal open paren
     (?:          # begin alternation group
         [^()]*+  #  match nonparens possessively
       |          # or else
         (?0)     #  recursively match entire pattern
     )*           # repeat alternation group
  \)              # literal close paren
}x

There's also a lot to be said for naming your abstractions, and decoupling their definition and its ordering from their execution. That leads to this sort of thing:

my $nested_paren_rx = qr{

    (?&nested_parens)

    (?(DEFINE)

        (?<open>       \(       )
        (?<close>       \)      )
        (?<nonparens> [^()]     )

        (?<nested_parens>
            (?&open)
            (?:
                (?&nonparens) *+
              |
                (?&nested_parens)
            ) *
            (?&close)
        )

    )
}x;

The second form is now amenable to inclusion in larger patterns.

Don't ever let anybody tell you can't use a pattern to match something that's recursively defined. As I've just demonstrated, you most certainly can.

While you're at it, make sure never to write line-noise patterns. You don't have to, and you shouldn't. No programming language can be maintainable that forbids white space, comments, subroutines, or alphanumeric identifiers. So use all those things in your patterns.

Of course, it does help to pick the right language for this kind of work. ☺

Solution 2:

In case you are stuck with language whose regular expression syntax does not support recursive matching I'm giving you my simple Javascript implementation from which you should be able to make your own in the language of your choice:

function testBraces(s) {
    for (var i=0, j=0; i<s.length && j>=0; i++)
        switch(s.charAt(i)) {
            case '(': { j++ ; break; }
            case ')': { j-- ; break; }
        }

    return j == 0;
}

And here you can play with it: http://jsfiddle.net/BFsn2/

Solution 3:

Such nested structure cannot be effectively handled by regular expressions. What you need is a grammar and a parser for that grammar. In your case the grammar is simple enough. If you are using python try pyparsing or funcparserlib.

With pyparsing you can do the following:

from pyparsing import nestedExpr
nestedExpr().parseString( "(some (string you) (want) (to) test)" ).asList()

This will return a list containing the parsed components of the nested string. The default delimiter for nestedExpr is parenthesis, so you do not have to do anything extra. If you want to use funcpasrerlib you can try the following

from funcparserlib.parser import forward_decl, many, a
bracketed = forward_decl()
bracketed.define(a('(') + many(bracketed) + a(')'))

After this you can call

bracketed.parse( "( (some) ((test) (string) (you) (want)) (to test))" )

and it will return the parsed elements in a tuple.

creating spark data structure from multiline record

JavaScript - why Array.prototype.fill actually fills a "pointer" of object when filling anything like 'new Object()'

Using regexes, how to efficiently match strings between double quotes with embedded double quotes?

Matplotlib and Ipython-notebook: Displaying exactly the figure that will be saved

Dynamically call macro from sas data step

Normalize array subscripts for 1-dimensional array so they start with 1

Programmatically navigating in React-Router v4

How to replace MySQL functions with PDO?

JQGRID - maintain check box selection state - page refresh / redirect / reload

SQL query to pivot a column using CASE WHEN

Add event handler to HTML element using javascript

Eclipse: Attach source/javadoc to a library via a local property

Regular expression matching

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts