Union Vs Concat in Linq
I have a question on Union
and Concat
.
var a1 = (new[] { 1, 2 }).Union(new[] { 1, 2 }); // O/P : 1 2
var a2 = (new[] { 1, 2 }).Concat(new[] { 1, 2 }); // O/P : 1 2 1 2
var a3 = (new[] { "1", "2" }).Union(new[] { "1", "2" }); // O/P : "1" "2"
var a4 = (new[] { "1", "2" }).Concat(new[] { "1", "2" }); // O/P : "1" "2" "1" "2"
The above result are expected, but in the case of List<T>
I am getting the same result from both Union
and Concat
.
class X
{
public int ID { get; set; }
}
class X1 : X
{
public int ID1 { get; set; }
}
class X2 : X
{
public int ID2 { get; set; }
}
var lstX1 = new List<X1> { new X1 { ID = 10, ID1 = 10 }, new X1 { ID = 10, ID1 = 10 } };
var lstX2 = new List<X2> { new X2 { ID = 10, ID2 = 10 }, new X2 { ID = 10, ID2 = 10 } };
var a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>()); // O/P : a5.Count() = 4
var a6 = lstX1.Cast<X>().Concat(lstX2.Cast<X>()); // O/P : a6.Count() = 4
But both are behaving the same incase of List<T>
.
Any suggestions please?
Union returns Distinct
values. By default it will compare references of items. Your items have different references, thus they all are considered different. When you cast to base type X
, reference is not changed.
If you will override Equals
and GetHashCode
(used to select distinct items), then items will not be compared by reference:
class X
{
public int ID { get; set; }
public override bool Equals(object obj)
{
X x = obj as X;
if (x == null)
return false;
return x.ID == ID;
}
public override int GetHashCode()
{
return ID.GetHashCode();
}
}
But all your items have different value of ID
. So all items still considered different. If you will provide several items with same ID
then you will see difference between Union
and Concat
:
var lstX1 = new List<X1> { new X1 { ID = 1, ID1 = 10 },
new X1 { ID = 10, ID1 = 100 } };
var lstX2 = new List<X2> { new X2 { ID = 1, ID2 = 20 }, // ID changed here
new X2 { ID = 20, ID2 = 200 } };
var a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>()); // 3 distinct items
var a6 = lstX1.Cast<X>().Concat(lstX2.Cast<X>()); // 4
Your initial sample works, because integers are value types and they are compared by value.
Concat
literally returns the items from the first sequence followed by the items from the second sequence. If you use Concat
on two 2-item sequences, you will always get a 4-item sequence.
Union
is essentially Concat
followed by Distinct
.
In your first two cases, you end up with 2-item sequences because, between them, each pair of input squences has exactly two distinct items.
In your third case, you end up with a 4-item sequence because all four items in your two input sequences are distinct.
Union
and Concat
behave the same since Union
can not detect duplicates without a custom IEqualityComparer<X>
. It's just looking if both are the same reference.
public class XComparer: IEqualityComparer<X>
{
public bool Equals(X x1, X x2)
{
if (object.ReferenceEquals(x1, x2))
return true;
if (x1 == null || x2 == null)
return false;
return x1.ID.Equals(x2.ID);
}
public int GetHashCode(X x)
{
return x.ID.GetHashCode();
}
}
Now you can use it in the overload of Union
:
var comparer = new XComparer();
a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>(), new XComparer());