Best practices for overriding isEqual: and hash

How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.

The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:

- (BOOL)isEqual:(id)other {
    if (other == self)
        return YES;
    if (!other || ![other isKindOfClass:[self class]])
        return NO;
    return [self isEqualToWidget:other];
}

- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
    if (self == aWidget)
        return YES;
    if (![(id)[self name] isEqual:[aWidget name]])
        return NO;
    if (![[self data] isEqualToData:[aWidget data]])
        return NO;
    return YES;
}

It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.

Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:

- (NSUInteger)hash {
    NSUInteger hash = 0;
    hash += [[self name] hash];
    hash += [[self data] hash];
    return hash;
}

Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?

(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)


Solution 1:

Start with

 NSUInteger prime = 31;
 NSUInteger result = 1;

Then for every primitive you do

 result = prime * result + var

For objects you use 0 for nil and otherwise their hashcode.

 result = prime * result + [var hash];

For booleans you use two different values

 result = prime * result + ((var)?1231:1237);

Explanation and Attribution

This is not tcurdt's work, and comments were asking for more explanation, so I believe an edit for attribution is fair.

This algorithm was popularized in the book "Effective Java", and the relevant chapter can currently be found online here. That book popularized the algorithm, which is now a default in a number of Java applications (including Eclipse). It derived, however, from an even older implementation which is variously attributed to Dan Bernstein or Chris Torek. That older algorithm originally floated around on Usenet, and certain attribution is difficult. For example, there is some interesting commentary in this Apache code (search for their names) that references the original source.

Bottom line is, this is a very old, simple hashing algorithm. It is not the most performant, and it is not even proven mathematically to be a "good" algorithm. But it is simple, and a lot of people have used it for a long time with good results, so it has a lot of historical support.

Solution 2:

I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).

On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.

In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.

There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.

Solution 3:

A simple XOR over the hash values of critical properties is sufficient 99% of the time.

For example:

- (NSUInteger)hash
{
    return [self.name hash] ^ [self.data hash];
}

Solution found at http://nshipster.com/equality/ by Mattt Thompson (which also referred this question in his post!)