Join tables with SUM issue in MYSQL
When you use multiple joins in the main query, you end up with a cross product of all the tables, so the sums get multiplied by the number of rows matching in another table. You need to move the sums into subqueries.
SELECT last_name, first_name, DATE_FORMAT(LEAST(mil_date, tm_date), '%m/%d/%y' ) AS dates,
total, minutes
FROM bhds_teachers AS i
LEFT JOIN (
SELECT ds_id, YEARWEEK(mil_date) AS week, MIN(mil_date) AS mil_date, SUM(drive_time) AS minutes
FROM bhds_mileage
WHERE mil_date BETWEEN '2016-04-11' AND '2016-04-30'
AND bhds_mileage.ds_id = 5
GROUP BY ds_id, week) AS m
ON m.ds_id = i.ds_id
LEFT JOIN (
SELECT ds_id, YEARWEEK(tm_date) AS week, MIN(tm_date) AS tm_date, SUM(tm_hours) AS total
WHERE tm_date BETWEEN '2016-04-11' AND '2016-04-30' AND bhds_timecard.ds_id = 5
GROUP BY ds_id, week) AS t
ON t.ds_id = i.ds_id AND t.week = m.week
There's a couple of issues... partial Cartesian product (cross product) between bhds_mileage
and bhds_timecard
, because each detail row (within a group) from one table will be "cross joined" with the detail rows from the other table. And that happens before the GROUP BY operation collapses the rows, and computes the SUM. That explains why you are seeing "inflated" values.
The workaround for that is to compute at least one of the SUM() aggregates in an inline view... get the SUM() / GROUP BY() done like one of your first queries does. For clarity, you could do the same thing for both of the original queries, and then join the results from the inline views.
MySQL doesn't natively support FULL outer join. One of the tables will need to be the driving table. For example, we could use _timecard
as the driving table, but that would mean that we have to return a row for a given week from _timecard
in order to return a corresponding row from _mileage. That is, without a row in _timecard
, we couldn't get a row from _mileage
.
We notice that the join to bhds_teacher
is an outer join. If we have a foreign key constraint between ds_id
in both _mileage
and _timecard
, referencing _teacher
, then that wouldn't necessarily need to be an outer join, we could use an inner join, and use _teacher
as the driving table for two outer joins.
Another issue is the non-aggregates in the SELECT list... e.g. DATE_FORMAT((tm_date), '%m/%d/%y')
The GROUP BY is on year and week, so the value from the DATE_FORMAT is indeterminate... it could be from any tm_date
within the group. There's no guarantee that you'll get the first day of the week, the earliest date within the week or whatever.
Also, the second parameter for WEEK
function is omitted, so that will default to the default_week_format
system variable. Personally, I'd avoid the YEAR
, WEEK
and CONCAT
functions, and go with a simpler DATE_FORMAT
, using an date format string that explicitly includes the mode parameter for the week.
If you want to join on "week", then the join predicate should be on the "week" value, not one indeterminate date within the week.
(There may be some specific constraints on the data which we aren't aware of... if there are rows in _mileage for a given week, on a monday, then we are guaranteed to have a _timecard for that same monday. In the more general case, we wouldn't have that guarantee.)
Even if we do have that guarantee, we aren't guaranteed that the non-aggregate in the SELECT list won't return the date from a Tuesday _timecard, and a Thursday _mileage... (unless there's some sort of guarantee that the data will include only rows with "Monday" dates on _timecard and _mileage). Withouat that, the non-aggregate expression is not a reliable expression for a join predicate.
Assuming ds_id
is unique on _teacher
, and is referenced by foreign keys ds_id
from both _mileage
and _timecard
, then something like this:
SELECT i.last_name
, i.first_name
, tm.dates
, tm.total_hours
, mm.total_minutes
FROM bhds_teacher i
LEFT
JOIN ( SELECT t.ds_id
, DATE_FORMAT( t.tm_date,'%Y/%U') AS week_
, DATE_FORMAT( MIN(t.tm_date) ,'%m/%d/%y') AS dates
, SUM(t.tm_hours) AS total_hours
FROM bhds_timecard t
WHERE t.tm_date BETWEEN '2016-04-11' AND '2016-04-30' -- <
AND t.ds_id = 5 -- <
GROUP
BY t.ds_id
, DATE_FORMAT( t.tm_date,'%Y/%U') -- week
) tm
ON tm.ds_id = i.ds_id
LEFT
JOIN ( SELECT m.ds_id
, DATE_FORMAT( m.mil_date,'%Y/%U') AS week_
, DATE_FORMAT( MIN(m.mil_date), '%m/%d/%y' ) AS dates
, SUM( m.drive_time ) AS total_minutes
FROM bhds_mileage m
WHERE m.mil_date BETWEEN '2016-04-11' AND '2016-04-30' -- <
AND m.ds_id = 5 -- <
GROUP
BY m.ds_id
, DATE_FORMAT( m.mil_date,'%Y/%U') -- week
) mm
ON mm.ds_id = i.ds_id
AND mm.week_ = tm.week_
WHERE i.ds_id = 5 -- <
ORDER
BY i.last_name ASC, tm.dates ASC