What's the role of GetHashCode in the IEqualityComparer<T> in .NET?
I'm trying to understand the role of the GetHashCode method of the interface IEqualityComparer.
The following example is taken from MSDN:
using System;
using System.Collections.Generic;
class Example {
static void Main() {
try {
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box,
string>(boxEqC);
Box redBox = new Box(4, 3, 4);
Box blueBox = new Box(4, 3, 4);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Console.WriteLine(redBox.GetHashCode());
Console.WriteLine(blueBox.GetHashCode());
}
catch (ArgumentException argEx) {
Console.WriteLine(argEx.Message);
}
}
}
public class Box {
public Box(int h, int l, int w) {
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
}
class BoxEqualityComparer : IEqualityComparer<Box> {
public bool Equals(Box b1, Box b2) {
if (b1.Height == b2.Height & b1.Length == b2.Length
& b1.Width == b2.Width) {
return true;
}
else {
return false;
}
}
public int GetHashCode(Box bx) {
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
Shouldn't the Equals method implementation be enough to compare two Box objects? That is where we tell the framework the rule used to compare the objects. Why is the GetHashCode needed?
Thanks.
Lucian
A bit of background first...
Every object in .NET has an Equals method and a GetHashCode method.
The Equals method is used to compare one object with another object - to see if the two objects are equivalent.
The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.
A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.
When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.
When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.
Now to explain why both methods are necessary, consider this example:
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC);
Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.
When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.
In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.
TLDR
So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.
GetHashCode is used in Dictionary colections and it creates hash for storing objects in it. Here is a nice article why and how to use IEqualtyComparer and GetHashCode http://dotnetperls.com/iequalitycomparer
While it would be possible for a Dictionary<TKey,TValue>
to have its GetValue
and similar methods call Equals
on every single stored key to see whether it matches the one being sought, that would be very slow. Instead, like many hash-based collections, it relies upon GetHashCode
to quickly exclude most non-matching values from consideration. If calling GetHashCode
on an item being sought yields 42, and a collection has 53,917 items, but calling GetHashCode
on 53,914 of the items yielded a value other than 42, then only 3 items will have to be compared to the ones being sought. The other 53,914 may safely be ignored.
The reason a GetHashCode
is included in an IEqualityComparer<T>
is to allow for the possibility that a dictionary's consumer might want to regard as equal objects that would normally not regard each other as equal. The most common example would be a caller that wants to use strings as keys but use case-insensitive comparisons. In order to make that work efficiently, the dictionary will need to have some form of hash function that will yield the same value for "Fox" and "FOX", but hopefully yield something else for "box" or "zebra". Since the GetHashCode
method built into String
doesn't work that way, the dictionary will need to get such a method from somewhere else, and IEqualityComparer<T>
is the most logical place since the need for such a hash code would be very strongly associated with an Equals
method that considers "Fox" and "FOX" identical to each other, but not to "box" or "zebra".