What hash algorithm does Python's dictionary mapping use?
The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.
Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.
The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):
def hash(tuple):
mult = 1000003
x = 0x345678
for index, item in enumerate(tuple):
x = ((x ^ hash(item)) * mult) & (1<<32)
mult += (82520 + (len(tuple)-index)*2)
return x + 97531
For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:
def hash(string):
x = string[0] << 7
for chr in string[1:]:
x = ((1000003 * x) ^ chr) & (1<<32)
return x
A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)