python .replace() regex [duplicate]
I am trying to do a grab everything after the '</html>'
tag and delete it, but my code doesn't seem to be doing anything. Does .replace()
not support regex?
z.write(article.replace('</html>.+', '</html>'))
Solution 1:
No. Regular expressions in Python are handled by the re
module.
article = re.sub(r'(?is)</html>.+', '</html>', article)
In general:
text_after = re.sub(regex_search_term, regex_replacement, text_before)
Solution 2:
In order to replace text using regular expression use the re.sub function:
sub(pattern, repl, string[, count, flags])
It will replace non-everlaping instances of pattern
by the text passed as string
. If you need to analyze the match to extract information about specific group captures, for instance, you can pass a function to the string
argument. more info here.
Examples
>>> import re
>>> re.sub(r'a', 'b', 'banana')
'bbnbnb'
>>> re.sub(r'/\d+', '/{id}', '/andre/23/abobora/43435')
'/andre/{id}/abobora/{id}'