How do you perform a left outer join using linq extension methods
For a (left outer) join of a table Bar
with a table Foo
on Foo.Foo_Id = Bar.Foo_Id
in lambda notation:
var qry = Foo.GroupJoin(
Bar,
foo => foo.Foo_Id,
bar => bar.Foo_Id,
(x,y) => new { Foo = x, Bars = y })
.SelectMany(
x => x.Bars.DefaultIfEmpty(),
(x,y) => new { Foo=x.Foo, Bar=y});
Since this seems to be the de facto SO question for left outer joins using the method (extension) syntax, I thought I would add an alternative to the currently selected answer that (in my experience at least) has been more commonly what I'm after
// Option 1: Expecting either 0 or 1 matches from the "Right"
// table (Bars in this case):
var qry = Foos.GroupJoin(
Bars,
foo => foo.Foo_Id,
bar => bar.Foo_Id,
(f,bs) => new { Foo = f, Bar = bs.SingleOrDefault() });
// Option 2: Expecting either 0 or more matches from the "Right" table
// (courtesy of currently selected answer):
var qry = Foos.GroupJoin(
Bars,
foo => foo.Foo_Id,
bar => bar.Foo_Id,
(f,bs) => new { Foo = f, Bars = bs })
.SelectMany(
fooBars => fooBars.Bars.DefaultIfEmpty(),
(x,y) => new { Foo = x.Foo, Bar = y });
To display the difference using a simple data set (assuming we're joining on the values themselves):
List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 4, 5 };
// Result using both Option 1 and 2. Option 1 would be a better choice
// if we didn't expect multiple matches in tableB.
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3 }
List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 3, 4 };
// Result using Option 1 would be that an exception gets thrown on
// SingleOrDefault(), but if we use FirstOrDefault() instead to illustrate:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3 } // Misleading, we had multiple matches.
// Which 3 should get selected (not arbitrarily the first)?.
// Result using Option 2:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3 }
{ A = 3, B = 3 }
Option 2 is true to the typical left outer join definition, but as I mentioned earlier is often unnecessarily complex depending on the data set.
Group Join method is unnecessary to achieve joining of two data sets.
Inner Join:
var qry = Foos.SelectMany
(
foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id),
(foo, bar) => new
{
Foo = foo,
Bar = bar
}
);
For Left Join just add DefaultIfEmpty()
var qry = Foos.SelectMany
(
foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id).DefaultIfEmpty(),
(foo, bar) => new
{
Foo = foo,
Bar = bar
}
);
EF and LINQ to SQL correctly transform to SQL. For LINQ to Objects it is beter to join using GroupJoin as it internally uses Lookup. But if you are querying DB then skipping of GroupJoin is AFAIK as performant.
Personlay for me this way is more readable compared to GroupJoin().SelectMany()
You can create extension method like:
public static IEnumerable<TResult> LeftOuterJoin<TSource, TInner, TKey, TResult>(this IEnumerable<TSource> source, IEnumerable<TInner> other, Func<TSource, TKey> func, Func<TInner, TKey> innerkey, Func<TSource, TInner, TResult> res)
{
return from f in source
join b in other on func.Invoke(f) equals innerkey.Invoke(b) into g
from result in g.DefaultIfEmpty()
select res.Invoke(f, result);
}
Improving on Ocelot20's answer, if you have a table you're left outer joining with where you just want 0 or 1 rows out of it, but it could have multiple, you need to Order your joined table:
var qry = Foos.GroupJoin(
Bars.OrderByDescending(b => b.Id),
foo => foo.Foo_Id,
bar => bar.Foo_Id,
(f, bs) => new { Foo = f, Bar = bs.FirstOrDefault() });
Otherwise which row you get in the join is going to be random (or more specifically, whichever the db happens to find first).