Linq - left join on multiple (OR) conditions

I need to do a left join on multiple conditions where the conditions are ORs rather than ANDs. I've found lots of samples of the latter but am struggling to get the right answer for my scenario.

from a in tablea
join b in tableb on new { a.col1, a.col2 } equals new { b.col1, b.col2 }
group a by a into g
select new () { col1 = a.col1, col2 = a.col2, count = g.Count() }

works great for joins where all conditions must match. I need to get the join to match on a.col1 = b.col1 OR a.col2 = b.col2.

I know it must be easy but I've coming up blank on this!

Edit:

To give a little more info, the purpose of the query is to get a projection containing all of the fields from 'a' plus a count of the matching records in 'b'. I've amended the sample above to try and illustrate what I'm after. When I run with the above using the approach Jon Skeet has noted I'm getting a count of all records from a, not the count of the related records in b.

The basic left join works fine:

from a in tablea
from b in tableb
.Where( b => ( a.col1 == b.col1 || a.col2 == b.col2))
.DefaultIfEmpty()
select new { col1 = a.col1, col2 = a.col2 }

If I revise it to add the grouping as below

from a in tablea
from b in tableb
.Where( b => ( a.col1 == b.col1 || a.col2 == b.col2))
.DefaultIfEmpty()
group a by a.col1 into g
select new { col1 = g.Key, count = g.Count() }

I'm getting the count of the records returned from a - not the count of records in matching in b.

Edit:

I'll give the answer to Jon - I've solved my count issue - I hadn't realized I could use a lamda to filter the count (g.Count(x => x != null)). Plus I need to group b by a rather than a by a as I had above. This gives the correct result but the SQL is not as efficient as I'd write it by hand as it adds a correlated sub query - if anyone can advise a better way of writing it to simulate the following SQL I'd appreciate it!

select a.col1, count(b.col1)
from tablea a
left join tableb b
on a.col1 = b.col1
or a.col2 = b.col2
group by a.col1

Solution 1:

LINQ only directly supports equijoins. If you want to do any other kind of join, you basically need a cross-join and where:

from a in tablea
from b in tableb
where a.col1 == b.col1 || a.col2 == b.col2
select ...

It's probably worth checking what the generated SQL looks like and what the query plan is like. There may be more efficient ways of doing it, but this is probably the simplest approach.

Solution 2:

Depending on the query provider, you could just choose to use two from clauses:

from a in tablea
from b in tableb 
where a.col1 == b.col1 || a.col2 == b.col2

Which, if you execute on a DB, will be just as efficient. If you execute in-memory (Linq to Objects), this will enumerate all possible combinations, which may be inefficient.

Arg, Skeeted ;-).

More efficient Linq to Objects alternatives are possible. The join operator enumerates each source only once, and then does a hash-join, so you could split the or-clause into two seperate joins, and then take their union. A union in linq is just a concatenation without duplicates, so that would look as follows:

(from a in tablea
join b in tableb on a.Col1 equals b.Col1
select new {a, b})
.Concat(
from a in tablea
join b in tableb on a.Col2 equals b.Col2
select new {a, b}
).Distinct()

This approach works, and it's just one query, but it's somewhat non-obvious in the sense that the performance characteristics of the code depend on understanding in detail how linq works. Personally, if you want to do a hash-join with potentially multiple matches, a more obvious tool is ToLookup. An alternative using that might look as follows:

var bBy1 = tableb.ToLookup(b=>b.Col1);
var bBy2 = tableb.ToLookup(b=>b.Col2);
var q3 = 
    from a in tablea
    from b in bBy1[a.Col1].Concat(bBy2[a.Col2]).Distinct()
    ...

This solution is actually shorter, and the reason it works is more obvious, so it's the one I'd prefer. Just remember that if you split the || operator into two separate queries like in the above two scenarios that you need to manually avoid double-counting the results (i.e. use Distinct).