C# Distinct on IEnumerable<T> with custom IEqualityComparer
Here's what I'm trying to do. I'm querying an XML file using LINQ to XML, which gives me an IEnumerable<T
> object, where T is my "Village" class, filled with the results of this query. Some results are duplicated, so I would like to perform a Distinct() on the IEnumerable object, like so:
public IEnumerable<Village> GetAllAlliances()
{
try
{
IEnumerable<Village> alliances =
from alliance in xmlDoc.Elements("Village")
where alliance.Element("AllianceName").Value != String.Empty
orderby alliance.Element("AllianceName").Value
select new Village
{
AllianceName = alliance.Element("AllianceName").Value
};
// TODO: make it work...
return alliances.Distinct(new AllianceComparer());
}
catch (Exception ex)
{
throw new Exception("GetAllAlliances", ex);
}
}
As the default comparer would not work for the Village object, I implemented a custom one, as seen here in the AllianceComparer class:
public class AllianceComparer : IEqualityComparer<Village>
{
#region IEqualityComparer<Village> Members
bool IEqualityComparer<Village>.Equals(Village x, Village y)
{
// Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y))
return true;
// Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.AllianceName == y.AllianceName;
}
int IEqualityComparer<Village>.GetHashCode(Village obj)
{
return obj.GetHashCode();
}
#endregion
}
The Distinct() method doesn't work, as I have exactly the same number of results with or without it. Another thing, and I don't know if it's usually possible, but I cannot step into AllianceComparer.Equals() to see what could be the problem.
I've found examples of this on the Internet, but I can't seem to make my implementation work.
Hopefully, someone here might see what could be wrong here! Thanks in advance!
The problem is with your GetHashCode
. You should alter it to return the hash code of AllianceName
instead.
int IEqualityComparer<Village>.GetHashCode(Village obj)
{
return obj.AllianceName.GetHashCode();
}
The thing is, if Equals
returns true
, the objects should have the same hash code which is not the case for different Village
objects with same AllianceName
. Since Distinct
works by building a hash table internally, you'll end up with equal objects that won't be matched at all due to different hash codes.
Similarly, to compare two files, if the hash of two files are not the same, you don't need to check the files themselves at all. They will be different. Otherwise, you'll continue to check to see if they are really the same or not. That's exactly what the hash table that Distinct
uses behaves.
return alliances.Select(v => v.AllianceName).Distinct();
That would return an IEnumerable<string>
instead of IEnumerable<Village>
.
Or change the line
return alliances.Distinct(new AllianceComparer());
to
return alliances.Select(v => v.AllianceName).Distinct();