Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)

My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.

If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.

I hope that answers at least one of your questions.

As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.

There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.

No. 2 is kind of vague. Not sure what you have in mind there.

Two collections should be considered equal if they contain the same elements, and further if the collections are ordered, that the elements are in the same order.

On the subject of hashes for collections, it should be enough to combine the hashes of the elements in some way (XOR them or modulo add them). Note that while the rules state that two objects that are equal according to IsEqual need to return the same hash, the opposite does not hold : Although uniqueness of hashes is desireable, it is not necessary for correctness of the solution. Thus an ordered collection need not take account of the order of the elements.

The excerpt from the Apple documentation is a necessary restriction by the way. An object could not maintain the same hash value under mutation while also ensuring that objects with the same value have the same hash. That applies for the simplest of objects as well as collections. Of course it only usually matters that an object's hash changes when it is inside a container that uses the hash to organise it's elements. The upshot of all this is that mutable collections shouldn't mutate when placed inside another container, but then neither should any object that has a true hash function.

I have done some investigation into the NSArray and NSMutableArray default hash implementation and (unless I have misunderstood something) it seams like Apple do not follow thier own rules:

If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)

Here is my test code

NSMutableArray* myMutableArray = [NSMutableArray arrayWithObjects:@"a", @"b", @"c", nil];
NSMutableArray* containerForMutableArray = [NSMutableArray arrayWithObject:myMutableArray];

NSUInteger hashBeforeMutation = [[containerForMutableArray objectAtIndex:0] hash];
[[containerForMutableArray objectAtIndex:0] removeObjectAtIndex:1];
NSUInteger hashAfterMutation = [[containerForMutableArray objectAtIndex:0] hash];

NSLog(@"Hash Before: %d", hashBeforeMutation);
NSLog(@"Hash After : %d", hashAfterMutation);

The output is:

Hash Before: 3
Hash After : 2

So it seams like the default implementation for the Hash method on both NSArray and NSMutableArray is the count of the array and it dosn't care if its inside a collection or not.

Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

Related

Recent Posts