SQL Server multiple joins on same table

You are joining an association's users and companies. With 2 companies and 3 users you get six rows. Then you count count(comp.id) and count(u.id). Neither ID is null in the joined rows, so their results equal count(*) which is 6.

When dealing with multiple aggregations it is advisable to aggregate your data before joining. After all you want to join the number of companies and the number of users to your associations. These numbers are aggregation results.

As to employees this even gets a little more complicated, because employees can be related to an association either directly or by company. So with 5 employees linked by company and five employees linked by association, you may get, say, 8 total employees, in case two work for the company and are directly linked to the association, too. This may best be solved in a lateral outer join aka OUTER APPLY in SQL Server.

Well, coming from some few particular associations, you can just as well make all outer joins lateral.

select 
  row_number() over(order by (select 1)) as itemno,
  a*, c.*, e.*, u.*
from associations a
outer apply
(
  select count(*) as total_companies
  from companies
  where companies.assoc_id = a.id
) c
outer apply
(
  select
    count(*) as total_employees,
    count(case when employees.assoc_id = a.id then 1 end) as association_employees,
    count(case when employees.comp_id = c.id then 1 end) as company_employees
  from employees
  where employees.assoc_id = a.id
     or employees.comp_id = c.id
) e
outer apply
(
  select count(*) as total_users
  from users
  where users.assoc_id = a.id
) u
where a.title like ('%' + @search + '%') and a.status = @status
order by a.title;

This is much better than joining all single rows and then counting distinct IDs, because if you do that, you create a possibly huge intermediate result the DBMS must sort and look through to get the distinct counts. That can be very costly.

When you start joining to tables down multiple paths of the relationship graph (such as associations -> companies -> employees, associations -> employees, and associations -> users) you start getting a lot of duplicate records that are next to impossible to sort out.

What I believe you need are some correlated subqueries that can independently inspect records down each path to calculate the results.

The following is one approach.

SELECT 
    ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS itemNo,
    ac.title,
    ac.id,
    ac.status,
    (
        SELECT count(comp.id)
        FROM companies comp
        WHERE comp.assoc_id = ac.id
    ) as total_comps,
    (
        SELECT count(comp_emps.id)
        FROM companies comp
        JOIN employees comp_emps
            ON comp_emps.comp_id  = comp.id
        WHERE comp.assoc_id = ac.id
    ) as total_comp_emps,
    (
        SELECT count(assoc_emps.id)
        JOIN employees assoc_emps
        WHERE assoc_emps.assoc_id = ac.id 
    )   as total_assoc_emps,
    --SUM(total_comp_emps,total_assoc_emps) total_comp_assoc_emps,
    --SUM((COALESCE(comp_emps.id,0))+(COALESCE(assoc_emps.id,0))) total_comp_assoc_emps,
    --(total_comp_emps + total_assoc_emps) as total_comp_assoc_emps,
    (
        SELECT count(u.id)
        FROM users u
        WHERE u.assoc_id = ac.id
    ) as total_users

FROM associations ac
where ac.title like ('%' + @search + '%') and ac.status = @status
order by ac.title asc

You can also move intermediate calculations into a CROSS APPLY and then reference those intermediate results in your final select. This can be useful if you need to reuse certain results.

SELECT 
    ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS itemNo,
    ac.title,
    ac.id,
    ac.status,
    counts.total_comps,
    counts.total_comp_emps,
    counts.total_assoc_emps,
    --SUM(total_comp_emps,total_assoc_emps) total_comp_assoc_emps,
    --SUM((COALESCE(comp_emps.id,0))+(COALESCE(assoc_emps.id,0))) total_comp_assoc_emps,
    (counts.total_comp_emps + counts.total_assoc_emps) as total_comp_assoc_emps,
    counts.total_users

FROM associations ac
CROSS APPLY (
    SELECT
        (
            SELECT count(comp.id)
            FROM companies comp
            WHERE comp.assoc_id = ac.id
        ) as total_comps,
        (
            SELECT count(comp_emps.id)
            FROM companies comp
            JOIN employees comp_emps
                ON comp_emps.comp_id  = comp.id
            WHERE comp.assoc_id = ac.id
        ) as total_comp_emps,
        (
            SELECT count(assoc_emps.id)
            FROM employees assoc_emps
            WHERE assoc_emps.assoc_id = ac.id 
        )   as total_assoc_emps,
        (
            SELECT count(u.id)
            FROM users u
            WHERE u.assoc_id = ac.id
        ) as total_users
) counts
where ac.title like ('%' + @search + '%') and ac.status = @status
order by ac.title asc

I'm not clear on your intent for total_comp_assoc_emps, but the above is just add two intermediate values. If employees can belong to associations both directly and via companies, and you do not want to double count them, the total_comp_assoc_emps calculation is a bit more involved.

The following gathers employ IDs from both sources and counts each distinct ID once.

(
    SELECT count(distinct id)
    FROM (
            SELECT comp_emps.id
            FROM companies comp
            JOIN employees comp_emps
                ON comp_emps.comp_id  = comp.id
            WHERE comp.assoc_id = ac.id
        UNION
            SELECT assoc_emps.id
            FROM employees assoc_emps
            WHERE assoc_emps.assoc_id = ac.id 
    ) emp_ids
) as total_assoc_emps,

SQL Server multiple joins on same table

Related

Recent Posts