How to use python regex to replace using captured group? [duplicate]
Suppose I want to change the blue dog and blue cat wore blue hats
to the gray dog and gray cat wore blue hats
.
With sed
I could accomplish this as follows:
$ echo 'the blue dog and blue cat wore blue hats' | sed 's/blue \(dog\|cat\)/gray \1/g'
How can I do a similar replacement in Python? I've tried:
>>> import re
>>> s = "the blue dog and blue cat wore blue hats"
>>> p = re.compile(r"blue (dog|cat)")
>>> p.sub('gray \1',s)
'the gray \x01 and gray \x01 wore blue hats'
You need to escape your backslash:
p.sub('gray \\1', s)
alternatively you can use a raw string as you already did for the regex:
p.sub(r'gray \1', s)
As I was looking for a similar answer; but wanting using named groups within the replace, I thought I'd add the code for others:
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'gray \g<animal>',s)
Off topic, For numbered capture groups:
#/usr/bin/env python
import re
re.sub(
pattern=r'(\d)(\w+)',
repl='word: \\2, digit: \\1',
string='1asdf'
)
word: asdf, digit: 1
Python uses literal backslash, plus one-based-index to do numbered capture group replacements, as shown in this example. So \1
, entered as '\\1'
, references the first capture group (\d)
, and \2
the second captured group.
Try this:
p.sub('gray \g<1>',s)