How to remove foreign escaped quotes from string? Python
So the website in question renders like this:
Those curly quotes point different ways for opening and closing, they are so called "smart quotes", and their UTF-16 hex codes are 201C and 201D.
So to remove them you can use those codes instead of r'\"'
:
.replace('\u201c', '').replace('\u201d', '')
But how a problem like this can be solved in general?
You can copy the text directly from the site and save it in a text file with UTF-16 encoding. Then look at the binary contents of the file, e.g. using hexdump
command on linux/macOS, find the character codes and convert them to Python strings like this '\u<4-character hex unicode sequence>'
.