Python pickling after changing a module's directory

I've recently changed my program's directory layout: before, I had all my modules inside the "main" folder. Now, I've moved them into a directory named after the program, and placed an __init__.py there to make a package.

Now I have a single .py file in my main directory that is used to launch my program, which is much neater.

Anyway, trying to load in pickled files from previous versions of my program is failing. I'm getting, "ImportError: No module named tools" - which I guess is because my module was previously in the main folder, and now it's in whyteboard.tools, not simply plain tools. However, the code that is importing in the tools module lives in the same directory as it, so I doubt there's a need to specify a package.

So, my program directory looks something like this:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

whyteboard.py launches a block of code from whyteboard/gui.py, that fires up the GUI. This pickling problem definitely wasn't happening before the directory re-organizing.


Solution 1:

As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored

whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).

If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...:

import sys
from whyteboard import tools

sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that).

Solution 2:

This can be done with a custom "unpickler" that uses find_class():

import io
import pickle


class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "tools":
            renamed_module = "whyteboard.tools"

        return super(RenameUnpickler, self).find_class(renamed_module, name)


def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()


def renamed_loads(pickled_bytes):
    file_obj = io.BytesIO(pickled_bytes)
    return renamed_load(file_obj)

Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().

Solution 3:

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)

Solution 4:

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self):
...     return 5
... 
>>> f = Foo()
>>> 
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x
... 
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4

Solution 5:

This is the normal behavior of pickle, unpickled objects need to have their defining module importable.

You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.