Using IS NULL or IS NOT NULL on join conditions - Theory question
Example with tables A and B:
A (parent) B (child)
============ =============
id | name pid | name
------------ -------------
1 | Alex 1 | Kate
2 | Bill 1 | Lia
3 | Cath 3 | Mary
4 | Dale NULL | Pan
5 | Evan
If you want to find parents and their kids, you do an INNER JOIN
:
SELECT id, parent.name AS parent
, pid, child.name AS child
FROM
parent INNER JOIN child
ON parent.id = child.pid
Result is that every match of a parent
's id
from the left table and a child
's pid
from the second table will show as a row in the result:
+----+--------+------+-------+
| id | parent | pid | child |
+----+--------+------+-------+
| 1 | Alex | 1 | Kate |
| 1 | Alex | 1 | Lia |
| 3 | Cath | 3 | Mary |
+----+--------+------+-------+
Now, the above does not show parents without kids (because their ids do not have a match in child's ids, so what do you do? You do an outer join instead. There are three types of outer joins, the left, the right and the full outer join. We need the left one as we want the "extra" rows from the left table (parent):
SELECT id, parent.name AS parent
, pid, child.name AS child
FROM
parent LEFT JOIN child
ON parent.id = child.pid
Result is that besides previous matches, all parents that do not have a match (read: do not have a kid) are shown too:
+----+--------+------+-------+
| id | parent | pid | child |
+----+--------+------+-------+
| 1 | Alex | 1 | Kate |
| 1 | Alex | 1 | Lia |
| 3 | Cath | 3 | Mary |
| 2 | Bill | NULL | NULL |
| 4 | Dale | NULL | NULL |
| 5 | Evan | NULL | NULL |
+----+--------+------+-------+
Where did all those NULL
come from? Well, MySQL (or any other RDBMS you may use) will not know what to put there as these parents have no match (kid), so there is no pid
nor child.name
to match with those parents. So, it puts this special non-value called NULL
.
My point is that these NULLs
are created (in the result set) during the LEFT OUTER JOIN
.
So, if we want to show only the parents that do NOT have a kid, we can add a WHERE child.pid IS NULL
to the LEFT JOIN
above. The WHERE
clause is evaluated (checked) after the JOIN
is done. So, it's clear from the above result that only the last three rows where the pid
is NULL will be shown:
SELECT id, parent.name AS parent
, pid, child.name AS child
FROM
parent LEFT JOIN child
ON parent.id = child.pid
WHERE child.pid IS NULL
Result:
+----+--------+------+-------+
| id | parent | pid | child |
+----+--------+------+-------+
| 2 | Bill | NULL | NULL |
| 4 | Dale | NULL | NULL |
| 5 | Evan | NULL | NULL |
+----+--------+------+-------+
Now, what happens if we move that IS NULL
check from the WHERE
to the joining ON
clause?
SELECT id, parent.name AS parent
, pid, child.name AS child
FROM
parent LEFT JOIN child
ON parent.id = child.pid
AND child.pid IS NULL
In this case the database tries to find rows from the two tables that match these conditions. That is, rows where parent.id = child.pid
AND child.pid IN NULL
. But it can find no such match because no child.pid
can be equal to something (1, 2, 3, 4 or 5) and be NULL at the same time!
So, the condition:
ON parent.id = child.pid
AND child.pid IS NULL
is equivalent to:
ON 1 = 0
which is always False
.
So, why does it return ALL rows from the left table? Because it's a LEFT JOIN! And left joins return rows that match (none in this case) and also rows from the left table that do not match the check (all in this case):
+----+--------+------+-------+
| id | parent | pid | child |
+----+--------+------+-------+
| 1 | Alex | NULL | NULL |
| 2 | Bill | NULL | NULL |
| 3 | Cath | NULL | NULL |
| 4 | Dale | NULL | NULL |
| 5 | Evan | NULL | NULL |
+----+--------+------+-------+
I hope the above explanation is clear.
Sidenote (not directly related to your question): Why on earth doesn't Pan
show up in none of our JOINs? Because his pid
is NULL
and NULL in the (not common) logic of SQL is not equal to anything so it can't match with any of the parent ids (which are 1,2,3,4 and 5). Even if there was a NULL there, it still wouldn't match because NULL
does not equal anything, not even NULL
itself (it's a very strange logic, indeed!). That's why we use the special check IS NULL
and not a = NULL
check.
So, will Pan
show up if we do a RIGHT JOIN
? Yes, it will! Because a RIGHT JOIN will show all results that match (the first INNER JOIN we did) plus all rows from the RIGHT table that don't match (which in our case is one, the (NULL, 'Pan')
row.
SELECT id, parent.name AS parent
, pid, child.name AS child
FROM
parent RIGHT JOIN child
ON parent.id = child.pid
Result:
+------+--------+------+-------+
| id | parent | pid | child |
+---------------+------+-------+
| 1 | Alex | 1 | Kate |
| 1 | Alex | 1 | Lia |
| 3 | Cath | 3 | Mary |
| NULL | NULL | NULL | Pan |
+------+--------+------+-------+
Unfortunately, MySQL does not have FULL JOIN
. You can try it in other RDBMSs, and it will show:
+------+--------+------+-------+
| id | parent | pid | child |
+------+--------+------+-------+
| 1 | Alex | 1 | Kate |
| 1 | Alex | 1 | Lia |
| 3 | Cath | 3 | Mary |
| 2 | Bill | NULL | NULL |
| 4 | Dale | NULL | NULL |
| 5 | Evan | NULL | NULL |
| NULL | NULL | NULL | Pan |
+------+--------+------+-------+
The NULL
part is calculated AFTER the actual join, so that is why it needs to be in the where clause.