Good GetHashCode() override for List of Foo objects respecting the order

EnumerableObject : IEnumerable<Foo>

wraps a List<Foo>

If EnumerableObject a.SequenceEquals( EnumerableObject b), then they are equal.

Therefore, a GetHashCode must be implemented. The problem is XORing each element in the list will return the same hash code for any list with all and only the same elements, regardless of order. This is Okay in terms of it working, but will result in many collisions, which will slow down retrieval, etc.

What is a good, fast GetHashCode method for lists of objects that is order dependent?


I'd do it the same way I normally combine hash codes - with an addition and a multiplication:

public override int GetHashCode()
{
    unchecked
    {
        int hash = 19;
        foreach (var foo in foos)
        {
            hash = hash * 31 + foo.GetHashCode();
        }
        return hash;
    }
}

(Note that you shouldn't add anything to the list after this has been used for the key in a hash table of any description, as the hash will change. This also assumes that there are no null entries - if there could be, you need to take account of that.)


Firstly, double-check that you need a hashcode at all. Are you going to be putting these lists into a hash-mapped structure (e.g. dictionary, hashset, etc)? If not, forget about it.

Now, assuming that you mean that EnumerableObject already overrides Equals(object) (and hopefully therefore also implements IEquatable<EnumerableObject>) for some reason, then this is indeed necessary. You want to balance speed versus bit distribution.

A good starting point is a mult+add or a shift+xor like:

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    foreach(var item in this)
    {
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

(This assumes that you are using item.Equals() for your sequence equality comparison, if you're using an IEqualityComparer's equals you'll need to call into its hashcode).

From there we can optimise.

If null items are disallowed, remove the null-check (be careful, this will make the code throw if it ever does find a null).

If very large lists are common we need to reduce the number examined, while trying not to result in lots of collisions. Compare the following different implementations:

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int max = Math.Min(Count, 16);
    for(int i = 0, i != max; ++i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int min = Math.Max(-1, Count - 16);
    for(int i = Count -1, i != min; --i)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

public override int GetHashCode()
{
    int res = 0x2D2816FE;
    int step = Count / 16 + 1;
    for(int i = 0, i < Count; i += step)
    {
        var item = this[i];
        res = res * 31 + (item == null ? 0 : item.GetHashCode());
    }
    return res;
}

Each of these restrict the total number of items examined, which speeds execution but risks poorer quality hashes. Which (if any) is best depends on whether collections with the same start or the same end are more likely.

Changing the number 16 above adjusts the balance; smaller is faster but higher is better hash quality with a lower risk of hash collisions.

Edit: And now you can use my implementation of SpookyHash v. 2:

public override int GetHashCode()
{
  var hasher = new SpookyHash();//use methods with seeds if you need to prevent HashDos
  foreach(var item in this)
    hasher.Update(item.GetHashCode());//or relevant feeds of item, etc.
  return hasher.Final().GetHashCode();
}

This will create a much better distribution than mult+add or shift+xor, while also being particularly fast (especially in 64-bit processes as the algorithm is optimised for that, though it works well on 32-bit too).