Why is there no Linq method to return distinct values by a predicate?
I want to get the distinct values in a list, but not by the standard equality comparison.
What I want to do is something like this:
return myList.Distinct( (x, y) => x.Url == y.Url );
I can't, there's no extension method in Linq that will do this - just one that takes an IEqualityComparer
.
I can hack around it with this:
return myList.GroupBy( x => x.Url ).Select( g => g.First() );
But that seems messy. It also doesn't quite do the same thing - I can only use it here because I have a single key.
I could also add my own:
public static IEnumerable<T> Distinct<T>(
this IEnumerable<T> input, Func<T,T,bool> compare )
{
//write my own here
}
But that does seem rather like writing something that should be there in the first place.
Anyone know why this method isn't there?
Am I missing something?
Solution 1:
It's annoying, certainly. It's also part of my "MoreLINQ" project which I must pay some attention to at some point :) There are plenty of other operations which make sense when acting on a projection, but returning the original - MaxBy and MinBy spring to mind.
As you say, it's easy to write - although I prefer the name "DistinctBy" to match OrderBy etc. Here's my implementation if you're interested:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
return source.DistinctBy(keySelector,
EqualityComparer<TKey>.Default);
}
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (keySelector == null)
{
throw new ArgumentNullException("keySelector");
}
if (comparer == null)
{
throw new ArgumentNullException("comparer");
}
return DistinctByImpl(source, keySelector, comparer);
}
private static IEnumerable<TSource> DistinctByImpl<TSource, TKey>
(IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
Solution 2:
But that seems messy.
It's not messy, it's correct.
- If you want
Distinct
Programmers by FirstName and there are four Amy's, which one do you want? - If you
Group
programmers By FirstName and take theFirst
one, then it is clear what you want to do in the case of four Amy's.
I can only use it here because I have a single key.
You can do a multiple key "distinct" with the same pattern:
return myList
.GroupBy( x => new { x.Url, x.Age } )
.Select( g => g.First() );
Solution 3:
Jon, your solution is pretty good. One minor change though. I don't think we need EqualityComparer.Default in there. Here is my solution (ofcourse the starting point was Jon Skeet's solution)
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
{
//TODO All arg checks
HashSet<TKey> keys = new HashSet<TKey>();
foreach (T item in source)
{
TKey key = keySelector(item);
if (!keys.Contains(key))
{
keys.Add(key);
yield return item;
}
}
}
Solution 4:
Using AmyB's answer, I've written a small DistinctBy
extension method, to allow a predicate to be passed:
/// <summary>
/// Distinct method that accepts a perdicate
/// </summary>
/// <typeparam name="TSource">The type of the t source.</typeparam>
/// <typeparam name="TKey">The type of the t key.</typeparam>
/// <param name="source">The source.</param>
/// <param name="predicate">The predicate.</param>
/// <returns>IEnumerable<TSource>.</returns>
/// <exception cref="System.ArgumentNullException">source</exception>
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source,
Func<TSource, TKey> predicate)
{
if (source == null)
throw new ArgumentNullException("source");
return source
.GroupBy(predicate)
.Select(x => x.First());
}
You can now pass a predicate to group the list by:
var distinct = myList.DistinctBy(x => x.Id);
Or group by multiple properties:
var distinct = myList.DistinctBy(x => new { x.Id, x.Title });