Can't use '\1' backreference to capture-group in a function call in re.sub() repr expression
I have a string S = '02143'
and a list A = ['a','b','c','d','e']
. I want to replace all those digits in 'S' with their corresponding element in list A
.
For example, replace 0
with A[0]
, 2
with A[2]
and so on. Final output should be S = 'acbed'
.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'
. I guess it is considering backreference '\g<1>'
as a string. How can I solve this especially using re.sub
and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S)
does not work is that \g<1>
(which is an unambiguous representation of the first backreference otherwise written as \1
) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1>
literal string, since the re
module won't have any chance of evaluating it at that time. re
engine only evaluates it during a match, but the A[int(r'\g<1>')]
part is evaluated before the re
engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub
as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re
documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If
repl
is a function, it is called for every non-overlapping occurrence ofpattern
. The function takes a single match object argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group()
.