In Java, why must equals() and hashCode() be consistent?
Solution 1:
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap
has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
- Calculate the masked hash code;
- Find the appropriate bucket;
- If it's empty, key not found;
- If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap
will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true
) but have different hash codes then the HashMap
will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
Solution 2:
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must override hashCode in every class that overrides equals. Failure to do so will result in a violation of the general contract for Object.hashCode, which will pre- vent your class from functioning properly in conjunction with all hash-based collec- tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is the second one: Equal objects must have equal hash codes. Two distinct instances may be logically equal according to the class’s equals method, but to the Object class’s hashCode method, they’re just two objects with nothing much in common. Therefore object’s hashCode method returns two seemingly random numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber { private final short areaCode; private final short exchange; private final short extension; public PhoneNumber(int areaCode, int exchange, int extension) { rangeCheck(areaCode, 999, "area code"); rangeCheck(exchange, 999, "exchange"); rangeCheck(extension, 9999, "extension"); this.areaCode = (short) areaCode; this.exchange = (short) exchange; this.extension = (short) extension; } private static void rangeCheck(int arg, int max, String name) { if (arg < 0 || arg > max) throw new IllegalArgumentException(name +": " + arg); } public boolean equals(Object o) { if (o == this) return true; if (!(o instanceof PhoneNumber)) return false; PhoneNumber pn = (PhoneNumber)o; return pn.extension == extension && pn.exchange == exchange && pn.areaCode == areaCode; } // No hashCode method! ... // Remainder omitted }
Suppose you attempt to use this class with a HashMap:
Map m = new HashMap(); m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867, 5309))
to return"Jenny"
, but it returnsnull
. Notice that two PhoneNumber instances are involved: One is used for insertion into the HashMap, and a second, equal, instance is used for (attempted) retrieval. The PhoneNumber class’s failure to override hashCode causes the two equal instances to have unequal hash codes, in violation of the hashCode contract. Therefore the get method looks for the phone number in a different hash bucket from the one in which it was stored by the put method. Fixing this problem is as simple as providing a proper hashCode method for the PhoneNumber class. [...]
See the Chapter 3 for the full content.
Solution 3:
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B)
, then a HashSet is expecting A to be in the same place as B
. If you put A
in with value V
, and look up B
, you should expect to get V
back (since you've said A.equals(B)
). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Solution 4:
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo
is 99999
and it winds up at a specific spot in the myFoos
HashSet. Later when you get the someFoo
and you look for it in the myFoos
HashSet, it needs to generate the same hashCode so you can find it.
Solution 5:
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get()
in a hash table calls key.equals(potentialMatch)
before returning a value.
If equals()
and hashCode()
are inconsistent you can get very inconsistent behavior. Say for two objects, a
and b
, a.equals(b)
returns true, but a.hashCode() != b.hashCode()
. Insert a and a HashSet will return false for .contains(b)
, but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.