C# Determine Duplicate in List [duplicate]
Requirement: In an unsorted List, determine if a duplicate exists. The typical way I would do this is an n-squared nested loop. I'm wondering how others solve this. Is there an elegant, high performance method in Linq? Something generic that takes a lambda or a comparer would be nice.
Solution 1:
Unless I'm missing something, then you should be able to get away with something simple using Distinct()
. Granted it won't be the most complex implementation you could come up with, but it will tell you if any duplicates get removed:
var list = new List<string>();
// Fill the list
if(list.Count != list.Distinct().Count())
{
// Duplicates exist
}
Solution 2:
According to Eric White's article on how to Find Duplicates using LINQ:
An easy way to find duplicates is to write a query that groups by the identifier, and then filter for groups that have more than one member. In the following example, we want to know that 4 and 3 are duplicates:
int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 }; var duplicates = listOfItems .GroupBy(i => i) .Where(g => g.Count() > 1) .Select(g => g.Key); foreach (var d in duplicates) Console.WriteLine(d); // 4,3
Solution 3:
In order to allow short circuiting if the duplicate exists early in the list, you can add a HashSet<T>
and check the return value of its .Add
method.
By using .Any
you can short circuit the enumeration as soon as you find a duplicate.
Here's a LINQ extension method in both C# and VB:
CSharp:
public static bool ContainsDuplicates<T>(this IEnumerable<T> enumerable)
{
var knownKeys = new HashSet<T>();
return enumerable.Any(item => !knownKeys.Add(item));
}
Visual Basic:
<Extension>
Public Function ContainsDuplicates(Of T)(ByVal enumerable As IEnumerable(Of T)) As Boolean
Dim knownKeys As New HashSet(Of T)
Return enumerable.Any(Function(item) Not knownKeys.Add(item))
End Function
Note: to check if there are no duplicates, just change Any
to All
Solution 4:
Place all items in a set and if the count of the set is different from the count of the list then there is a duplicate.
bool hasDuplicates<T>(List<T> myList) {
var hs = new HashSet<T>();
for (var i = 0; i < myList.Count; ++i) {
if (!hs.Add(myList[i])) return true;
}
return false;
}
Should be more efficient than Distinct as there is no need to go through all the list.
Solution 5:
Something along these lines is relatively simple and will provide you with a count of duplicates.
var something = new List<string>() { "One", "One", "Two", "Three" };
var dictionary = new Dictionary<string, int>();
something.ForEach(s =>
{
if (dictionary.ContainsKey(s))
{
dictionary[s]++;
}
else
{
dictionary[s] = 1;
}
});
I imagine this is similar to the implementation of Distinct, although I'm not certain.