MySQL "NOT IN" query
I wanted to run a simple query to throw up all the rows of Table1
where a principal column value is not present in a column in another table (Table2
).
I tried using:
SELECT * FROM Table1 WHERE Table1.principal NOT IN Table2.principal
This is instead throwing a syntax error. Google search led me to forums where people were saying that MySQL does not support NOT IN
and something extremely complex needs to be used. Is this true? Or am I making a horrendous mistake?
To use IN, you must have a set, use this syntax instead:
SELECT * FROM Table1 WHERE Table1.principal NOT IN (SELECT principal FROM table2)
The subquery option has already been answered, but note that in many cases a LEFT JOIN
can be a faster way to do this:
SELECT table1.*
FROM table1 LEFT JOIN table2 ON table2.principal=table1.principal
WHERE table2.principal IS NULL
If you want to check multiple tables to make sure it's not present in any of the tables (like in SRKR's comment), you can use this:
SELECT table1.*
FROM table1
LEFT JOIN table2 ON table2.name=table1.name
LEFT JOIN table3 ON table3.name=table1.name
WHERE table2.name IS NULL AND table3.name IS NULL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL in MySQL
MySQL, as well as all other systems except SQL Server, is able to optimize
LEFT JOIN
/IS NULL
to returnFALSE
as soon the matching value is found, and it is the only system that cared to document this behavior. […] Since MySQL is not capable of usingHASH
andMERGE
join algorithms, the onlyANTI JOIN
it is capable of is theNESTED LOOPS ANTI JOIN
[…]
Essentially, [
NOT IN
] is exactly the same plan thatLEFT JOIN
/IS NULL
uses, despite the fact these plans are executed by the different branches of code and they look different in the results ofEXPLAIN
. The algorithms are in fact the same in fact and the queries complete in same time.
[…]
It’s hard to tell exact reason for [performance drop when using
NOT EXISTS
], since this drop is linear and does not seem to depend on data distribution, number of values in both tables etc., as long as both fields are indexed. Since there are three pieces of code in MySQL that essentialy do one job, it is possible that the code responsible forEXISTS
makes some kind of an extra check which takes extra time.
[…]
MySQL can optimize all three methods to do a sort of
NESTED LOOPS ANTI JOIN
. […] However, these three methods generate three different plans which are executed by three different pieces of code. The code that executesEXISTS
predicate is about 30% less efficient […]That’s why the best way to search for missing values in MySQL is using a
LEFT JOIN
/IS NULL
orNOT IN
rather thanNOT EXISTS
.
(emphases added)
Be carefull NOT IN
is not an alias for <> ANY
, but for <> ALL
!
http://dev.mysql.com/doc/refman/5.0/en/any-in-some-subqueries.html
SELECT c FROM t1 LEFT JOIN t2 USING (c) WHERE t2.c IS NULL
cant' be replaced by
SELECT c FROM t1 WHERE c NOT IN (SELECT c FROM t2)
You must use
SELECT c FROM t1 WHERE c <> ANY (SELECT c FROM t2)