How to find overlapping matches with a regexp?
>>> match = re.findall(r'\w\w', 'hello')
>>> print match
['he', 'll']
Since \w\w means two characters, 'he' and 'll' are expected. But why do 'el' and 'lo' not match the regex?
>>> match1 = re.findall(r'el', 'hello')
>>> print match1
['el']
>>>
findall
doesn't yield overlapping matches by default. This expression does however:
>>> re.findall(r'(?=(\w\w))', 'hello')
['he', 'el', 'll', 'lo']
Here (?=...)
is a lookahead assertion:
(?=...)
matches if...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example,Isaac (?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
You can use the new Python regex module, which supports overlapping matches.
>>> import regex as re
>>> match = re.findall(r'\w\w', 'hello', overlapped=True)
>>> print match
['he', 'el', 'll', 'lo']