When do we do GetHashCode() for a Dictionary?
I have used Dictionary(TKey, TValue) for many purposes. But I haven't encountered any scenario to implement GetHashCode() which I believe is because my keys were of primary types like int and string. I am curious to know the scenarios (real world examples) when one should use a custom object for key and thus implement methods GetHashCode() Equals() etc.
And, does using a custom object for key necessitate implementing these functions?
Solution 1:
You should override Equals
and GetHashCode
whenever the default Object.Equals
(tests for reference equality) will not suffice. This happens, for example, when the type of your key is a custom type and you want two keys to be considered equal even in cases when they are not the same instance of the custom type.
For example, if your key is as simple as
class Point {
public int X { get; set; }
public int Y { get; set; }
}
and you want two Point
s two be considered equal if their X
s are equal and their Y
s are equal then you will need to override Equals
and GetHashCode
.
Solution 2:
Just to make it clear: There is one important thing about Dictionary<TKey, TValue>
and GetHashCode()
: Dictionary uses GetHashCode to determine if two keys are equal i.e. if <TKey>
is of custom type you should care about implementing GetHashCode()
carefully. As Andrew Hare pointed out this is easy, if you have a simple type that identifies your custom object unambiguously. In case you have a combined identifier, it gets a little more complicated.
As example consider a complex number as TKey
. A complex number is determined by its real and its imaginary part. Both are of simple type e.g. double
. But how would you identify if two complex numbers are equal? You implement GetHashCode()
for your custom complex type and combine both identifying parts.
You find further reading on the latter here.
UPDATE
Based on Ergwun's comment I checked the behavior of Dictionary<TKey, TValue>.Add
with special respect to TKey
's implementation of Equals(object)
and GetHashCode()
. I
must confess that I was rather surprised by the results.
Given two objects k1
and k2
of type TKey
, two arbitrary objects v1
and v2
of type TValue
, and an empty dictionary d
of type Dictionary<TKey, TValue>
, this is what happens when adding v1
with key k1
to d
first and v2
with key k2
second (depending on the implementation of TKey.Equals(object)
and TKey.GetHashCode()
):
k1.Equals(k2) k1.GetHashCode() == k2.GetHashCode() d.Add(k2, v2)
false false ok
false true ok
true false ok
true true System.ArgumentException
Conclusion: I was wrong as I originally thought the second case (where Equals
returns false
but both key objects have same hash code) would raise an ArgumentException
. But as the third case shows dictionary in some way does use GetHashCode()
. Anyway it seems to be good advice that two objects that are the same type and are equal must return the same hash code to ensure that instances Dictionary<TKey, TValue>
work correctly.
Solution 3:
You have two questions here.
- When do you need to implement GetHashCode()
- Would you ever use an object for a dictionary key.
Lets start with 1. If you are writing a class that might possibly be used by someone else, you will want to define GetHashCode() and Equals(), when reference Equals() is not enough. If you're not planning on using it in a dictionary, and it's for your own usage, then I see no reason to skip GetHashCode() etc.
For 2), you should use an object anytime you have a need to have a constant time lookup from an object to some other type. Since GetHashCode() returns a numeric value, and collections store references, there is no penalty for using an Object over an Int or a string (remember a string is an object).
Solution 4:
One example is when you need to create a composite key (that is a key comprised of more that one piece of data). That composite key would be a custom type that would need to override those methods.
For example, let's say that you had an in-memory cache of address records and you wanted to check to see if an address was in cache to save an expensive trip to the database to retrieve it. Let's also say that addresses are unique in terms of their street 1 and zip code fields. You would implement your cache with something like this:
class AddressCacheKey
{
public String StreetOne { get; set; }
public String ZipCode { get; set; }
// overrides for Equals and GetHashCode
}
and
static Dictionary<AddressCacheKey,Address> cache;
Since your AddressCacheKey
type overrides the Equals
and GetHashCode
methods they would be a good candidate for a key in the dictionary and you would be able to determine whether or not you needed to take a trip to the database to retrieve a record based on more than one piece of data.