MySQL "NOT IN" query

I wanted to run a simple query to throw up all the rows of Table1 where a principal column value is not present in a column in another table (Table2).

I tried using:

SELECT * FROM Table1 WHERE Table1.principal NOT IN Table2.principal

This is instead throwing a syntax error. Google search led me to forums where people were saying that MySQL does not support NOT IN and something extremely complex needs to be used. Is this true? Or am I making a horrendous mistake?


To use IN, you must have a set, use this syntax instead:

SELECT * FROM Table1 WHERE Table1.principal NOT IN (SELECT principal FROM table2)

The subquery option has already been answered, but note that in many cases a LEFT JOIN can be a faster way to do this:

SELECT table1.*
FROM table1 LEFT JOIN table2 ON table2.principal=table1.principal
WHERE table2.principal IS NULL

If you want to check multiple tables to make sure it's not present in any of the tables (like in SRKR's comment), you can use this:

SELECT table1.*
FROM table1
LEFT JOIN table2 ON table2.name=table1.name
LEFT JOIN table3 ON table3.name=table1.name
WHERE table2.name IS NULL AND table3.name IS NULL

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL in MySQL

MySQL, as well as all other systems except SQL Server, is able to optimize LEFT JOIN / IS NULL to return FALSE as soon the matching value is found, and it is the only system that cared to document this behavior. […] Since MySQL is not capable of using HASH and MERGE join algorithms, the only ANTI JOIN it is capable of is the NESTED LOOPS ANTI JOIN

[…]

Essentially, [NOT IN] is exactly the same plan that LEFT JOIN / IS NULL uses, despite the fact these plans are executed by the different branches of code and they look different in the results of EXPLAIN. The algorithms are in fact the same in fact and the queries complete in same time.

[…]

It’s hard to tell exact reason for [performance drop when using NOT EXISTS], since this drop is linear and does not seem to depend on data distribution, number of values in both tables etc., as long as both fields are indexed. Since there are three pieces of code in MySQL that essentialy do one job, it is possible that the code responsible for EXISTS makes some kind of an extra check which takes extra time.

[…]

MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN. […] However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient […]

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

(emphases added)


Be carefull NOT IN is not an alias for <> ANY, but for <> ALL!

http://dev.mysql.com/doc/refman/5.0/en/any-in-some-subqueries.html

SELECT c FROM t1 LEFT JOIN t2 USING (c) WHERE t2.c IS NULL

cant' be replaced by

SELECT c FROM t1 WHERE c NOT IN (SELECT c FROM t2)

You must use

SELECT c FROM t1 WHERE c <> ANY (SELECT c FROM t2)