SQL "select where not in subquery" returns no results
Update:
These articles in my blog describe the differences between the methods in more detail:
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:SQL Server
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:PostgreSQL
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:Oracle
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:MySQL
There are three ways to do such a query:
-
LEFT JOIN / IS NULL
:SELECT * FROM common LEFT JOIN table1 t1 ON t1.common_id = common.common_id WHERE t1.common_id IS NULL
-
NOT EXISTS
:SELECT * FROM common WHERE NOT EXISTS ( SELECT NULL FROM table1 t1 WHERE t1.common_id = common.common_id )
-
NOT IN
:SELECT * FROM common WHERE common_id NOT IN ( SELECT common_id FROM table1 t1 )
When table1.common_id
is not nullable, all these queries are semantically the same.
When it is nullable, NOT IN
is different, since IN
(and, therefore, NOT IN
) return NULL
when a value does not match anything in a list containing a NULL
.
This may be confusing but may become more obvious if we recall the alternate syntax for this:
common_id = ANY
(
SELECT common_id
FROM table1 t1
)
The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL
value yields the NULL
result which renders the whole result NULL
too.
We never cannot say definitely that common_id
is not equal to anything from this list, since at least one of the values is NULL
.
Suppose we have these data:
common
--
1
3
table1
--
NULL
1
2
LEFT JOIN / IS NULL
and NOT EXISTS
will return 3
, NOT IN
will return nothing (since it will always evaluate to either FALSE
or NULL
).
In MySQL
, in case on non-nullable column, LEFT JOIN / IS NULL
and NOT IN
are a little bit (several percent) more efficient than NOT EXISTS
. If the column is nullable, NOT EXISTS
is the most efficient (again, not much).
In Oracle
, all three queries yield same plans (an ANTI JOIN
).
In SQL Server
, NOT IN
/ NOT EXISTS
are more efficient, since LEFT JOIN / IS NULL
cannot be optimized to an ANTI JOIN
by its optimizer.
In PostgreSQL
, LEFT JOIN / IS NULL
and NOT EXISTS
are more efficient than NOT IN
, sine they are optimized to an Anti Join
, while NOT IN
uses hashed subplan
(or even a plain subplan
if the subquery is too large to hash)
If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.
Don't write IN clauses that allow nulls in the list side. Filter them out!
common_id not in
(
select common_id from Table1
where common_id is not null
)
Table1 or Table2 has some null values for common_id. Use this query instead:
select *
from Common
where common_id not in (select common_id from Table1 where common_id is not null)
and common_id not in (select common_id from Table2 where common_id is not null)
select *
from Common c
where not exists (select t1.commonid from table1 t1 where t1.commonid = c.commonid)
and not exists (select t2.commonid from table2 t2 where t2.commonid = c.commonid)