Encountering a problem while i try to scrape text websites and save it in a .txt file [duplicate]

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:
    f.write(html)

with this:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)

If you need to support Python 2, then use this:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

If your file is encoded in something other than UTF-8, specify whatever your actual encoding is for encoding.

I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).

In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

Replacing this line:

with open('filename', 'w') as f:

With this:

with open('filename', 'w', encoding='utf-8') as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.

set PYTHONIOENCODING=utf-8
set PYTHONLEGACYWINDOWSSTDIO=utf-8

You may or may not need to set that second environment variable PYTHONLEGACYWINDOWSSTDIO.

Alternatively, this can be done in code (although it seems that doing it through env vars is recommended):

sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')

Additionally: Reproducing this error was a bit of a pain, so leaving this here too in case you need to reproduce it on your machine:

set PYTHONIOENCODING=windows-1252
set PYTHONLEGACYWINDOWSSTDIO=windows-1252

Encountering a problem while i try to scrape text websites and save it in a .txt file [duplicate]

Related

Recent Posts