How to combine multiple regex into single one in python?

I'm learning about regular expression. I don't know how to combine different regular expression to make a single generic regular expression.

I want to write a single regular expression which works for multiple cases. I know this is can be done with naive approach by using or " | " operator.

I don't like this approach. Can anybody tell me better approach?


Solution 1:

You need to compile all your regex functions. Check this example:

import re
re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-Z]\s[A-Z]*'

sentences = [string1, string2, string3, string4]
for sentence in sentences:
    generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)

Solution 2:

To findall with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

re_list = [
    '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
    ...
    '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
]

matches = []
for r in re_list:
   matches += re.findall( r, string)

For efficiency it would be better to use a list of compiled REs.

Alternatively you could join the element RE strings using

generic_re = re.compile( '|'.join( re_list) )