Empty string instead of unmatched group error
Solution 1:
Root cause
Before Python 3.5, backreferences to failed capture groups in Python re.sub
were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.
There are two ways to fix that issue.
Solution 1: Adding empty alternatives to make optional groups obligatory
You can replace all optional capturing groups (those constructs like (\d+)?
) with obligatory ones with an empty alternative (i.e. (\d+|)
).
Here is an example of the failure:
import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something\1something', old)
print(new)
Replacing one line with
new = re.sub(r'regex(group|)regex', r'something\1something', old)
It works.
Solution 2: Using lambda expression in the replacement and checking if the group is not None
This approach is necessary if you have optional groups inside another optional group.
You can use a lambda in the replacement part to check if the group is initialized, not None
, with lambda m: m.group(n) or ''
. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (\s*\{{2}funcA(ka|)\s*\|\s*([^}]*)\s*\}{2}\s*|)
- is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)
) just does not get populated even after adding an empty alternative.
re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
r"\n | funcA"+str(n)+r" = \3\n | funcB"+str(n)+r" = \4\n | string"+str(n)+r" = \n",
text,
count=1)
should be re-written with
re.sub(r'(?i)(\s*{{funcA(ka|)\s*\|\s*([^}]*)\s*}}\s*|){{funcB\s*\|\s*([^}]*)\s*}}\s*',
lambda m: r"\n | funcA"+str(n)+r" = " + (m.group(3) or '') + "\n | funcB" + str(n) + r" = " + (m.group(4) or '') + "\n | string" + str(n) + r" = \n",
text,
count=1)
See IDEONE demo
import re
text = r'''
{{funcB|param1}}
*some string*
{{funcA|param2}}
{{funcB|param3}}
*some string2*
{{funcB|param4}}
*some string3*
{{funcAka|param5}}
{{funcB|param6}}
*some string4*
'''
for n in (range(1,(text.count('funcB')+1))):
text = re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
lambda m: r"\n | funcA"+str(n)+r" = "+(m.group(3) or '')+"\n | funcB"+str(n)+r" = "+(m.group(4) or '')+"\n | string"+str(n)+r" = \n",
text,
count=1)
assert text == r'''
| funcA1 =
| funcB1 = param1
| string1 =
*some string*
| funcA2 = param2
| funcB2 = param3
| string2 =
*some string2*
| funcA3 =
| funcB3 = param4
| string3 =
*some string3*
| funcA4 = param5
| funcB4 = param6
| string4 =
*some string4*
'''
print 'ok'