UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>
I'm trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:
Traceback (most recent call last):
File "SCRIPT LOCATION", line NUMBER, in <module>
`text = file.read()`
File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode
`return codecs.charmap_decode(input,self.errors,decoding_table)[0]`
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2907500: character maps to `<undefined>`
Solution 1:
The file in question is not using the CP1252
encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1
and UTF-8
. Since 0x90 doesn't actually mean anything in Latin-1
, UTF-8
(where 0x90 is a continuation byte) is more likely.
You specify the encoding when you open the file:
file = open(filename, encoding="utf8")
Solution 2:
If file = open(filename, encoding="utf-8")
doesn't work, tryfile = open(filename, errors="ignore")
, if you want to remove unneeded characters. (docs)
Solution 3:
Alternatively, if you don't need to decode the file, such as uploading the file to a website, use:
open(filename, 'rb')
where r = reading, b = binary
Solution 4:
As an extension to @LennartRegebro's answer:
If you can't tell what encoding your file uses and the solution above does not work (it's not utf8
) and you found yourself merely guessing - there are online tools that you could use to identify what encoding that is. They aren't perfect but usually work just fine. After you figure out the encoding you should be able to use solution above.
EDIT: (Copied from comment)
A quite popular text editor Sublime Text
has a command to display encoding if it has been set...
- Go to
View
->Show Console
(or Ctrl+`)
- Type into field at the bottom
view.encoding()
and hope for the best (I was unable to get anything butUndefined
but maybe you will have better luck...)