Explanation of HashMap#hash(int) method
Solution 1:
>>>
is the logical right shift (no sign-extension) (JLS 15.19 Shift Operators), and ^
is the bitwise exclusive-or (JLS 15.22.1 Integer Bitwise Operators).
As to why this is done, the documentation offers a hint: HashMap
uses power-of-two length tables, and hashes keys by masking away the higher bits and taking only the lower bits of their hash code.
// HashMap.java -- edited for conciseness
static int indexFor(int h, int length) {
return h & (length-1);
}
public V put(K key, V value) {
int hash = hash(key.hashCode());
int index = indexFor(hash, table.length);
// ...
}
So hash()
attempts to bring relevancy to the higher bits, which otherwise would get masked away (indexFor
basically discards the higher bits of h
and takes only the lower k
bits where length == (1 << k)
).
Contrast this with the way Hashtable
(which should have NOT a power-of-two length table) uses a key's hash code.
// Hashtable.java -- edited for conciseness
public synchronized V get(Object key) {
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length;
// ...
}
By doing the more expensive %
operation (instead of simple bit masking), the performance of Hashtable
is less sensitive to hash codes with poor distribution in the lower bits (especially if table.length
is a prime number).
Solution 2:
I don't know how all the shifting works, but the motivation is laid out in the comments:
The way the HashMap is implemented relies on the hashCode function being sufficiently well implemented. In particular, the lower bits of the hash value should be distributed evenly. If you have many collisions on the lower bits, the HashMap will not perform well.
Because the implementation of hashCode is outside of the control of HashMap (every object can implement their own), they supply an additional hash function that shifts the object's hashCode around a little to ensure that the lower bits are distributed more randomly. Again, I have no idea how this works exactly (or how effective it is), but I assume it depends on at least the higher bits being distributed equally (it seems to mesh the higher bits into the lower bits).
So what this does is to try to minimize collisions (and thus improve performance) in the presence of poorly implemented hashCode methods.