Wrap a delegate in an IEqualityComparer
On the importance of GetHashCode
Others have already commented on the fact that any custom IEqualityComparer<T>
implementation should really include a GetHashCode
method; but nobody's bothered to explain why in any detail.
Here's why. Your question specifically mentions the LINQ extension methods; nearly all of these rely on hash codes to work properly, because they utilize hash tables internally for efficiency.
Take Distinct
, for example. Consider the implications of this extension method if all it utilized were an Equals
method. How do you determine whether an item's already been scanned in a sequence if you only have Equals
? You enumerate over the entire collection of values you've already looked at and check for a match. This would result in Distinct
using a worst-case O(N2) algorithm instead of an O(N) one!
Fortunately, this isn't the case. Distinct
doesn't just use Equals
; it uses GetHashCode
as well. In fact, it absolutely does not work properly without an IEqualityComparer<T>
that supplies a proper GetHashCode
. Below is a contrived example illustrating this.
Say I have the following type:
class Value
{
public string Name { get; private set; }
public int Number { get; private set; }
public Value(string name, int number)
{
Name = name;
Number = number;
}
public override string ToString()
{
return string.Format("{0}: {1}", Name, Number);
}
}
Now say I have a List<Value>
and I want to find all of the elements with a distinct name. This is a perfect use case for Distinct
using a custom equality comparer. So let's use the Comparer<T>
class from Aku's answer:
var comparer = new Comparer<Value>((x, y) => x.Name == y.Name);
Now, if we have a bunch of Value
elements with the same Name
property, they should all collapse into one value returned by Distinct
, right? Let's see...
var values = new List<Value>();
var random = new Random();
for (int i = 0; i < 10; ++i)
{
values.Add("x", random.Next());
}
var distinct = values.Distinct(comparer);
foreach (Value x in distinct)
{
Console.WriteLine(x);
}
Output:
x: 1346013431 x: 1388845717 x: 1576754134 x: 1104067189 x: 1144789201 x: 1862076501 x: 1573781440 x: 646797592 x: 655632802 x: 1206819377
Hmm, that didn't work, did it?
What about GroupBy
? Let's try that:
var grouped = values.GroupBy(x => x, comparer);
foreach (IGrouping<Value> g in grouped)
{
Console.WriteLine("[KEY: '{0}']", g);
foreach (Value x in g)
{
Console.WriteLine(x);
}
}
Output:
[KEY = 'x: 1346013431'] x: 1346013431 [KEY = 'x: 1388845717'] x: 1388845717 [KEY = 'x: 1576754134'] x: 1576754134 [KEY = 'x: 1104067189'] x: 1104067189 [KEY = 'x: 1144789201'] x: 1144789201 [KEY = 'x: 1862076501'] x: 1862076501 [KEY = 'x: 1573781440'] x: 1573781440 [KEY = 'x: 646797592'] x: 646797592 [KEY = 'x: 655632802'] x: 655632802 [KEY = 'x: 1206819377'] x: 1206819377
Again: didn't work.
If you think about it, it would make sense for Distinct
to use a HashSet<T>
(or equivalent) internally, and for GroupBy
to use something like a Dictionary<TKey, List<T>>
internally. Could this explain why these methods don't work? Let's try this:
var uniqueValues = new HashSet<Value>(values, comparer);
foreach (Value x in uniqueValues)
{
Console.WriteLine(x);
}
Output:
x: 1346013431 x: 1388845717 x: 1576754134 x: 1104067189 x: 1144789201 x: 1862076501 x: 1573781440 x: 646797592 x: 655632802 x: 1206819377
Yeah... starting to make sense?
Hopefully from these examples it's clear why including an appropriate GetHashCode
in any IEqualityComparer<T>
implementation is so important.
Original answer
Expanding on orip's answer:
There are a couple of improvements that can be made here.
- First, I'd take a
Func<T, TKey>
instead ofFunc<T, object>
; this will prevent boxing of value type keys in the actualkeyExtractor
itself. - Second, I'd actually add a
where TKey : IEquatable<TKey>
constraint; this will prevent boxing in theEquals
call (object.Equals
takes anobject
parameter; you need anIEquatable<TKey>
implementation to take aTKey
parameter without boxing it). Clearly this may pose too severe a restriction, so you could make a base class without the constraint and a derived class with it.
Here's what the resulting code might look like:
public class KeyEqualityComparer<T, TKey> : IEqualityComparer<T>
{
protected readonly Func<T, TKey> keyExtractor;
public KeyEqualityComparer(Func<T, TKey> keyExtractor)
{
this.keyExtractor = keyExtractor;
}
public virtual bool Equals(T x, T y)
{
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
public int GetHashCode(T obj)
{
return this.keyExtractor(obj).GetHashCode();
}
}
public class StrictKeyEqualityComparer<T, TKey> : KeyEqualityComparer<T, TKey>
where TKey : IEquatable<TKey>
{
public StrictKeyEqualityComparer(Func<T, TKey> keyExtractor)
: base(keyExtractor)
{ }
public override bool Equals(T x, T y)
{
// This will use the overload that accepts a TKey parameter
// instead of an object parameter.
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
}
When you want to customize equality checking, 99% of the time you're interested in defining the keys to compare by, not the comparison itself.
This could be an elegant solution (concept from Python's list sort method).
Usage:
var foo = new List<string> { "abc", "de", "DE" };
// case-insensitive distinct
var distinct = foo.Distinct(new KeyEqualityComparer<string>( x => x.ToLower() ) );
The KeyEqualityComparer
class:
public class KeyEqualityComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, object> keyExtractor;
public KeyEqualityComparer(Func<T,object> keyExtractor)
{
this.keyExtractor = keyExtractor;
}
public bool Equals(T x, T y)
{
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
public int GetHashCode(T obj)
{
return this.keyExtractor(obj).GetHashCode();
}
}
I'm afraid there is no such wrapper out-of-box. However it's not hard to create one:
class Comparer<T>: IEqualityComparer<T>
{
private readonly Func<T, T, bool> _comparer;
public Comparer(Func<T, T, bool> comparer)
{
if (comparer == null)
throw new ArgumentNullException("comparer");
_comparer = comparer;
}
public bool Equals(T x, T y)
{
return _comparer(x, y);
}
public int GetHashCode(T obj)
{
return obj.ToString().ToLower().GetHashCode();
}
}
...
Func<int, int, bool> f = (x, y) => x == y;
var comparer = new Comparer<int>(f);
Console.WriteLine(comparer.Equals(1, 1));
Console.WriteLine(comparer.Equals(1, 2));
Ordinarily, I'd get this resolved by commenting @Sam on the answer (I've done some editing on the original post to clean it up a bit without altering the behavior.)
The following is my riff of @Sam's answer, with a [IMNSHO] critical fix to the default hashing policy:-
class FuncEqualityComparer<T> : IEqualityComparer<T>
{
readonly Func<T, T, bool> _comparer;
readonly Func<T, int> _hash;
public FuncEqualityComparer( Func<T, T, bool> comparer )
: this( comparer, t => 0 ) // NB Cannot assume anything about how e.g., t.GetHashCode() interacts with the comparer's behavior
{
}
public FuncEqualityComparer( Func<T, T, bool> comparer, Func<T, int> hash )
{
_comparer = comparer;
_hash = hash;
}
public bool Equals( T x, T y )
{
return _comparer( x, y );
}
public int GetHashCode( T obj )
{
return _hash( obj );
}
}