Splitting a string with repeated characters into a list
I am not well experienced with Regex but I have been reading a lot about it. Assume there's a string s = '111234'
I want a list with the string split into L = ['111', '2', '3', '4']
. My approach was to make a group checking if it's a digit or not and then check for a repetition of the group. Something like this
L = re.findall('\d[\1+]', s)
I think that \d[\1+]
will basically check for either "digit" or "digit +" the same repetitions. I think this might do what I want.
Use re.finditer()
:
>>> s='111234'
>>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)]
['111', '2', '3', '4']
If you want to group all the repeated characters, then you can also use itertools.groupby
, like this
from itertools import groupby
print ["".join(grp) for num, grp in groupby('111234')]
# ['111', '2', '3', '4']
If you want to make sure that you want only digits, then
print ["".join(grp) for num, grp in groupby('111aaa234') if num.isdigit()]
# ['111', '2', '3', '4']
Try this one:
s = '111234'
l = re.findall(r'((.)\2*)', s)
## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l
## now I am keeping only the first value from the tuple of each list
lst = [x[0] for x in l]
print lst
output:
['111', '2', '3', '4']