Difference between EXISTS and IN in SQL?

Solution 1:

The exists keyword can be used in that way, but really it's intended as a way to avoid counting:

--this statement needs to check the entire table
select count(*) from [table] where ...

--this statement is true as soon as one match is found
exists ( select * from [table] where ... )

This is most useful where you have if conditional statements, as exists can be a lot quicker than count.

The in is best used where you have a static list to pass:

 select * from [table]
 where [field] in (1, 2, 3)

When you have a table in an in statement it makes more sense to use a join, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in queries will always get a nested join plan, while join queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in is used.

Solution 2:

EXISTS will tell you whether a query returned any results. e.g.:

SELECT * 
FROM Orders o 
WHERE EXISTS (
    SELECT * 
    FROM Products p 
    WHERE p.ProductNumber = o.ProductNumber)

IN is used to compare one value to several, and can use literal values, like this:

SELECT * 
FROM Orders 
WHERE ProductNumber IN (1, 10, 100)

You can also use query results with the IN clause, like this:

SELECT * 
FROM Orders 
WHERE ProductNumber IN (
    SELECT ProductNumber 
    FROM Products 
    WHERE ProductInventoryQuantity > 0)

Solution 3:

Based on rule optimizer:

  • EXISTS is much faster than IN, when the sub-query results is very large.
  • IN is faster than EXISTS, when the sub-query results is very small.

Based on cost optimizer:

  • There is no difference.

Solution 4:

I'm assuming you know what they do, and thus are used differently, so I'm going to understand your question as: When would it be a good idea to rewrite the SQL to use IN instead of EXISTS, or vice versa.

Is that a fair assumption?


Edit: The reason I'm asking is that in many cases you can rewrite an SQL based on IN to use an EXISTS instead, and vice versa, and for some database engines, the query optimizer will treat the two differently.

For instance:

SELECT *
FROM Customers
WHERE EXISTS (
    SELECT *
    FROM Orders
    WHERE Orders.CustomerID = Customers.ID
)

can be rewritten to:

SELECT *
FROM Customers
WHERE ID IN (
    SELECT CustomerID
    FROM Orders
)

or with a join:

SELECT Customers.*
FROM Customers
    INNER JOIN Orders ON Customers.ID = Orders.CustomerID

So my question still stands, is the original poster wondering about what IN and EXISTS does, and thus how to use it, or does he ask wether rewriting an SQL using IN to use EXISTS instead, or vice versa, will be a good idea?

Solution 5:

  1. EXISTS is much faster than IN when the subquery results is very large.
    IN is faster than EXISTS when the subquery results is very small.

    CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
    GO
    CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
    GO
    
    INSERT INTO t1
    SELECT 1, 'title 1', 5 UNION ALL
    SELECT 2, 'title 2', 5 UNION ALL
    SELECT 3, 'title 3', 5 UNION ALL
    SELECT 4, 'title 4', 5 UNION ALL
    SELECT null, 'title 5', 5 UNION ALL
    SELECT null, 'title 6', 5
    
    INSERT INTO t2
    SELECT 1, 1, 'data 1' UNION ALL
    SELECT 2, 1, 'data 2' UNION ALL
    SELECT 3, 2, 'data 3' UNION ALL
    SELECT 4, 3, 'data 4' UNION ALL
    SELECT 5, 3, 'data 5' UNION ALL
    SELECT 6, 3, 'data 6' UNION ALL
    SELECT 7, 4, 'data 7' UNION ALL
    SELECT 8, null, 'data 8' UNION ALL
    SELECT 9, 6, 'data 9' UNION ALL
    SELECT 10, 6, 'data 10' UNION ALL
    SELECT 11, 8, 'data 11'
    
  2. Query 1

    SELECT
    FROM    t1 
    WHERE   not  EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
    

    Query 2

    SELECT t1.* 
    FROM   t1 
    WHERE  t1.id not in (SELECT  t2.t1id FROM t2 )
    

    If in t1 your id has null value then Query 1 will find them, but Query 2 cant find null parameters.

    I mean IN can't compare anything with null, so it has no result for null, but EXISTS can compare everything with null.