Understanding Pickling in Python

I have recently got an assignment where I need to put a dictionary (where each key refers to a list) in pickled form. The only problem is I have no idea what pickled form is. Could anyone point me in the right direction of some good resources to help me learn this concept?


The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure.

Pickling - is the process whereby a Python object hierarchy is converted into a byte stream, and Unpickling - is the inverse operation, whereby a byte stream is converted back into an object hierarchy.

Pickling (and unpickling) is alternatively known as serialization, marshalling, or flattening.

import pickle

data1 = {'a': [1, 2.0, 3, 4+6j],
         'b': ('string', u'Unicode string'),
         'c': None}

selfref_list = [1, 2, 3]
selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.
pickle.dump(data1, output)

# Pickle the list using the highest protocol available.
pickle.dump(selfref_list, output, -1)

output.close()

To read from a pickled file -

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

data2 = pickle.load(pkl_file)
pprint.pprint(data2)

pkl_file.close()

source - https://docs.python.org/2/library/pickle.html


Pickling is a mini-language that can be used to convert the relevant state from a python object into a string, where this string uniquely represents the object. Then (un)pickling can be used to convert the string to a live object, by "reconstructing" the object from the saved state founding the string.

>>> import pickle
>>> 
>>> class Foo(object):
...   y = 1
...   def __init__(self, x):
...     self.x = x
...     return
...   def bar(self, y):
...     return self.x + y
...   def baz(self, y):
...     Foo.y = y  
...     return self.bar(y)
... 
>>> f = Foo(2)
>>> f.baz(3)
5
>>> f.y
3
>>> pickle.dumps(f)
"ccopy_reg\n_reconstructor\np0\n(c__main__\nFoo\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nI2\nsb."

What you can see here is that pickle doesn't save the source code for the class, but does store a reference to the class definition. Basically, you can almost read the picked string… it says (roughly translated) "call copy_reg's reconstructor where the arguments are the class defined by __main__.Foo and then do other stuff". The other stuff is the saved state of the instance. If you look deeper, you can extract that "string x" is set to "the integer 2" (roughly: S'x'\np6\nI2). This is actually a clipped part of the pickled string for a dictionary entry… the dict being f.__dict__, which is {'x': 2}. If you look at the source code for pickle, it very clearly gives a translation for each type of object and operation from python to pickled byte code.

Note also that there are different variants of the pickling language. The default is protocol 0, which is more human-readable. There's also protocol 2, shown below (and 1,3, and 4, depending on the version of python you are using).

>>> pickle.dumps([1,2,3])
'(lp0\nI1\naI2\naI3\na.'
>>> 
>>> pickle.dumps([1,2,3], -1)
'\x80\x02]q\x00(K\x01K\x02K\x03e.'

Again, it's still a dialect of the pickling language, and you can see that the protocol 0 string says "get a list, include I1, I2, I3", while the protocol 2 is harder to read, but says the same thing. The first bit \x80\x02 indicates that it's protocol 2 -- then you have ] which says it's a list, then again you can see the integers 1,2,3 in there. Again, check the source code for pickle to see the exact mapping for the pickling language.

To reverse the pickling to a string, use load/loads.

>>> p = pickle.dumps([1,2,3])
>>> pickle.loads(p)
[1, 2, 3]

Pickling is just serialization: putting data into a form that can be stored in a file and retrieved later. Here are the docs on the pickle module:

http://docs.python.org/release/2.7/library/pickle.html


http://docs.python.org/library/pickle.html#example

import pickle

data1 = {'a': [1, 2.0, 3, 4+6j],
         'b': ('string', u'Unicode string'),
         'c': None}

selfref_list = [1, 2, 3]
selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.
pickle.dump(data1, output)

# Pickle the list using the highest protocol available.
pickle.dump(selfref_list, output, -1)

output.close()

Pickling in Python is used to serialize and de-serialize Python objects, like dictionary in your case. I usually use cPickle module as it can be much faster than the Pickle module.

import cPickle as pickle    

def serializeObject(pythonObj):
    return pickle.dumps(pythonObj, pickle.HIGHEST_PROTOCOL)

def deSerializeObject(pickledObj):
    return pickle.loads(pickledObj)