Unpickling a python 2 object with python 3
You'll have to tell pickle.load()
how to convert Python bytestring data to Python 3 strings, or you can tell pickle
to leave them as bytes.
The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load()
documentation:
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
Setting the encoding to latin1
allows you to import the data directly:
with open(mshelffile, 'rb') as f:
d = pickle.load(f, encoding='latin1')
but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.
The alternative would be to load the data with encoding='bytes'
, and decode all bytes
keys and values afterwards.
Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime
object data is broken unless you use encoding='bytes'
.
Using encoding='latin1'
causes some issues when your object contains numpy arrays in it.
Using encoding='bytes'
will be better.
Please see this answer for complete explanation of using encoding='bytes'