Computing an md5 hash of a data structure
Solution 1:
json.dumps() can sort dictionaries by key. So you don't need other dependencies:
import hashlib
import json
data = ['only', 'lists', [1,2,3], 'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings']
data_md5 = hashlib.md5(json.dumps(data, sort_keys=True).encode('utf-8')).hexdigest()
print(data_md5)
Prints:
87e83d90fc0d03f2c05631e2cd68ea02
Solution 2:
bencode sorts dictionaries so:
import hashlib
import bencode
data = ['only', 'lists', [1,2,3],
'dictionaries', {'a':0,'b':1}, 'numbers', 47, 'strings']
data_md5 = hashlib.md5(bencode.bencode(data)).hexdigest()
print data_md5
prints:
af1b88ca9fd8a3e828b40ed1b9a2cb20
Solution 3:
I ended up writing it myself as I thought I would have to:
class Hasher(object):
"""Hashes Python data into md5."""
def __init__(self):
self.md5 = md5()
def update(self, v):
"""Add `v` to the hash, recursively if needed."""
self.md5.update(str(type(v)))
if isinstance(v, basestring):
self.md5.update(v)
elif isinstance(v, (int, long, float)):
self.update(str(v))
elif isinstance(v, (tuple, list)):
for e in v:
self.update(e)
elif isinstance(v, dict):
keys = v.keys()
for k in sorted(keys):
self.update(k)
self.update(v[k])
else:
for k in dir(v):
if k.startswith('__'):
continue
a = getattr(v, k)
if inspect.isroutine(a):
continue
self.update(k)
self.update(a)
def digest(self):
"""Retrieve the digest of the hash."""
return self.md5.digest()
Solution 4:
You could use the builtin pprint that will cover some more cases than the proposed json.dumps()
solution. For example datetime
-objects will be handled correctly.
Your example rewritten to use pprint
instead of json
:
>>> import hashlib, random, pprint
>>> for i in range(10):
... k = [i*i for i in range(1000)]
... random.shuffle(k)
... d = dict.fromkeys(k, 1)
... print hashlib.md5(pprint.pformat(d)).hexdigest()
...
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db
b4e5de6e1c4f3c6540e962fd5b1891db