Find out number of capture groups in Python regular expressions

Is there a way to determine how many capture groups there are in a given regular expression?

I would like to be able to do the follwing:

def groups(regexp, s):
    """ Returns the first result of re.findall, or an empty default

    >>> groups(r'(\d)(\d)(\d)', '123')
    ('1', '2', '3')
    >>> groups(r'(\d)(\d)(\d)', 'abc')
    ('', '', '')
    """
    import re
    m = re.search(regexp, s)
    if m:
        return m.groups()
    return ('',) * num_of_groups(regexp)

This allows me to do stuff like:

first, last, phone = groups(r'(\w+) (\w+) ([\d\-]+)', 'John Doe 555-3456')

However, I don't know how to implement num_of_groups. (Currently I just work around it.)

EDIT: Following the advice from rslite, I replaced re.findall with re.search.

sre_parse seems like the most robust and comprehensive solution, but requires tree traversal and appears to be a bit heavy.

MizardX's regular expression seems to cover all bases, so I'm going to go with that.


Solution 1:

def num_groups(regex):
    return re.compile(regex).groups

Solution 2:

f_x = re.search(...)
len_groups = len(f_x.groups())

Solution 3:

Something from inside sre_parse might help.

At first glance, maybe something along the lines of:

>>> import sre_parse
>>> sre_parse.parse('(\d)\d(\d)')
[('subpattern', (1, [('in', [('category', 'category_digit')])])), 
('in', [('category', 'category_digit')]), 
('subpattern', (2, [('in', [('category', 'category_digit')])]))]

I.e. count the items of type 'subpattern':

import sre_parse

def count_patterns(regex):
    """
    >>> count_patterns('foo: \d')
    0
    >>> count_patterns('foo: (\d)')
    1
    >>> count_patterns('foo: (\d(\s))')
    1
    """
    parsed = sre_parse.parse(regex)
    return len([token for token in parsed if token[0] == 'subpattern'])

Note that we're only counting root level patterns here, so the last example only returns 1. To change this, tokens would need to searched recursively.

Solution 4:

First of all if you only need the first result of re.findall it's better to just use re.search that returns a match or None.

For the groups number you could count the number of open parenthesis '(' except those that are escaped by '\'. You could use another regex for that:

def num_of_groups(regexp):
    rg = re.compile(r'(?<!\\)\(')
    return len(rg.findall(regexp))

Note that this doesn't work if the regex contains non-capturing groups and also if '(' is escaped by using it as '[(]'. So this is not very reliable. But depending on the regexes that you use it might help.