How to pickle a python function with its dependencies?
Updated Sep 2020: See the comment by @ogrisel below. The developers of PiCloud moved to Dropbox shortly after I wrote the original version of this answer in 2013, though a lot of folks are still using the cloudpickle module seven years later. The module made its way to Apache Spark, where it has continued to be maintained and improved. I'm updating the example and background text below accordingly.
Cloudpickle
The cloudpickle package is able to pickle a function, method, class, or even a lambda, as well as any dependencies. To try it out, just pip install cloudpickle
and then:
import cloudpickle
def foo(x):
return x*3
def bar(z):
return foo(z)+1
x = cloudpickle.dumps(bar)
del foo
del bar
import pickle
f = pickle.loads(x)
print(f(3)) # displays "10"
In other words, just call cloudpickle.dump()
or cloudpickle.dumps()
the same way you'd use pickle.*
, then later use the native pickle.load()
or pickle.loads()
to thaw.
Background
PiCcloud.com released the cloud
python package under the LGPL, and other open-source projects quickly started using it (google for cloudpickle.py
to see a few). The folks at picloud.com had an incentive to put the effort into making general-purpose code pickling work -- their whole business was built around it. The idea was that if you had cpu_intensive_function()
and wanted to run it on Amazon's EC2 grid, you just replaced:
cpu_intensive_function(some, args)
with:
cloud.call(cpu_intensive_function, some, args)
The latter used cloudpickle
to pickle up any dependent code and data, shipped it to EC2, ran it, and returned the results to you when you called cloud.result()
.
Picloud billed in millisecond increments, it was cheap as heck, and I used it all the time for Monte Carlo simulations and financial time series analysis, when I needed hundreds of CPU cores for just a few seconds each. Years later, I still can't say enough good things about it and I didn't even work there.
I have tried basically the same approach to sending g over as f but f can still not see g. How do I get g into the global namespace so that it can be used by f in the receiving process?
Assign it to the global name g
. (I see you are assigning f
to func2
rather than to f
. If you are doing something like that with g
, then it is clear why f
can't find g
. Remember that name resolution happens at runtime -- g
isn't looked up until you call f
.)
Of course, I'm guessing since you didn't show the code you're using to do this.
It might be best to create a separate dictionary to use for the global namespace for the functions you're unpickling -- a sandbox. That way all their global variables will be separate from the module you're doing this in. So you might do something like this:
sandbox = {}
with open("functions.pickle", "rb") as funcfile:
while True:
try:
code = marshal.load(funcfile)
except EOFError:
break
sandbox[code.co_name] = types.FunctionType(code, sandbox, code.co_name)
In this example I assume that you've put the code objects from all your functions in one file, one after the other, and when reading them in, I get the code object's name and use it as the basis for both the function object's name and the name under which it's stored in the sandbox dictionary.
Inside the unpickled functions, the sandbox dictionary is their globals()
and so inside f()
, g
gets its value from sandbox["g"]
. To call f
then would be: sandbox["f"]("blah")
Every module has its own globals, there are no universal globals. We can "implant" restored functions into some module and use this like a normal module.
-- save --
import marshal
def f(x):
return x + 1
def g(x):
return f(x) ** 2
funcfile = open("functions.pickle", "wb")
marshal.dump(f.func_code, funcfile)
marshal.dump(g.func_code, funcfile)
funcfile.close()
-- restore --
import marshal
import types
open('sandbox.py', 'w').write('') # create an empty module 'sandbox'
import sandbox
with open("functions.pickle", "rb") as funcfile:
while True:
try:
code = marshal.load(funcfile)
except EOFError:
break
func = types.FunctionType(code, sandbox.__dict__, code.co_name)
setattr(sandbox, code.co_name, func) # or sandbox.f = ... if the name is fixed
assert sandbox.g(3) == 16 # f(3) ** 2
# it is possible import them from other modules
from sandbox import g
Edited:
You can do also import some module .e.g. "sys" to "sandbox" namespace from outside:
sandbox.sys = __import__('sys')
or the same:
exec 'import sys' in sandbox.__dict__
assert 'sys' in sandbox, 'Verify imported into sandbox'
Your original code would work if you do it not in ipython interactive but in a python program or normal python interactive!!!
Ipython uses some strange namespace that is not a dict of any module from sys.modules. Normal python or any main program use sys.modules['__main__'].__dict__
as globals(). Any module uses that_module.__dict__
which is also OK, only ipython interactive is a problem.