SQL NOT IN not working

I have two databases, one which holds the inventory, and another which contains a subset of the records of the primary database.

The following SQL statement is not working:

SELECT  stock.IdStock
        ,stock.Descr       
FROM    [Inventory].[dbo].[Stock] stock
WHERE   stock.IdStock NOT IN
        (SELECT foreignStockId FROM
         [Subset].[dbo].[Products])

The not in does not work. Removing the NOT gives the correct results, i.e. products that are in both databases. However, using the NOT IN is not returning ANY results at all.

What am I doing wrong, any ideas?


Solution 1:

SELECT foreignStockId
FROM   [Subset].[dbo].[Products]  

Probably returns a NULL.

A NOT IN query will not return any rows if any NULLs exists in the list of NOT IN values. You can explicitly exclude them using IS NOT NULL as below.

SELECT stock.IdStock,
       stock.Descr
FROM   [Inventory].[dbo].[Stock] stock
WHERE  stock.IdStock NOT IN (SELECT foreignStockId
                             FROM   [Subset].[dbo].[Products]
                             WHERE  foreignStockId IS NOT NULL) 

Or rewrite using NOT EXISTS instead.

SELECT stock.idstock,
       stock.descr
FROM   [Inventory].[dbo].[Stock] stock
WHERE  NOT EXISTS (SELECT *
                   FROM   [Subset].[dbo].[Products] p
                   WHERE  p.foreignstockid = stock.idstock) 

As well as having the semantics that you want the execution plan for NOT EXISTS is often simpler as looked at here.

The reason for the difference in behaviour is down to the three valued logic used in SQL. Predicates can evaluate to True, False, or Unknown.

A WHERE clause must evaluate to True in order for the row to be returned but this is not possible with NOT IN when NULL is present as explained below.

'A' NOT IN ('X','Y',NULL) is equivalent to 'A' <> 'X' AND 'A' <> 'Y' AND 'A' <> NULL)

  • 'A' <> 'X' = True
  • 'A' <> 'Y' = True
  • 'A' <> NULL = Unknown

True AND True AND Unknown evaluates to Unknown per the truth tables for three valued logic.

The following links have some additional discussion about performance of the various options.

  • Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
  • Left outer join vs NOT EXISTS
  • NOT EXISTS vs NOT IN

Solution 2:

If NOT IN does not work, you may always try to do LEFT JOIN. Then filter by WHERE using one of the values from the joined table, which are NULL. Provided, the value you were joining by does not contain any NULL value.

Solution 3:

Adding my 2 cents:

I've seen SQL Server returning wrong results even when switching to not exists and left join - in corrupt databases. Run DBCC CHECKTABLE on the tables involved, also look at the NOT IN query execution plan and rebuild the indexes involved, this should help.