RegEx with multiple groups?
I'm getting confused returning multiple groups in Python. My RegEx is this:
lun_q = 'Lun:\s*(\d+\s?)*'
And my string is
s = '''Lun: 0 1 2 3 295 296 297 298'''`
I return a matched object, and then want to look at the groups, but all it shows it the last number (258):
r.groups()
(u'298',)
Why isn't it returning groups of 0,1,2,3,4
etc.?
Your regex only contains a single pair of parentheses (one capturing group), so you only get one group in your match. If you use a repetition operator on a capturing group (+
or *
), the group gets "overwritten" each time the group is repeated, meaning that only the last match is captured.
In your example here, you're probably better off using .split()
, in combination with a regex:
lun_q = 'Lun:\s*(\d+(?:\s+\d+)*)'
s = '''Lun: 0 1 2 3 295 296 297 298'''
r = re.search(lun_q, s)
if r:
luns = r.group(1).split()
# optionally, also convert luns from strings to integers
luns = [int(lun) for lun in luns]
Another approach would be to use the regex you have to validate your data and then use a more specific regex that targets each item you wish to extract using a match iterator.
import re
s = '''Lun: 0 1 2 3 295 296 297 298'''
lun_validate_regex = re.compile(r'Lun:\s*((\d+)(\s\d+)*)')
match = lun_validate_regex.match(s)
if match:
token_regex = re.compile(r"\d{1,3}")
match_iterator = token_regex.finditer(match.group(1))
for token_match in match_iterator:
#do something brilliant
Sometimes, its easier without regex.
>>> s = '''Lun: 0 1 2 3 295 296 297 298'''
>>> if "Lun: " in s:
... items = s.replace("Lun: ","").split()
... for n in items:
... if n.isdigit():
... print n
...
0
1
2
3
295
296
297
298