I'm curious which of the following below would be more efficient?

I've always been a bit cautious about using IN because I believe SQL Server turns the result set into a big IF statement. For a large result set, this could result in poor performance. For small result sets, I'm not sure either is preferable. For large result sets, wouldn't EXISTS be more efficient?

WHERE EXISTS (SELECT * FROM Base WHERE bx.BoxID = Base.BoxID AND [Rank] = 2)

vs.

WHERE bx.BoxID IN (SELECT BoxID FROM Base WHERE [Rank = 2])

Solution 1:

EXISTS will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.

With IN, it will collect all the results from the sub-query before further processing.

Solution 2:

The accepted answer is shortsighted and the question a bit loose in that:

1) Neither explicitly mention whether a covering index is present in the left, right, or both sides.

2) Neither takes into account the size of input left side set and input right side set.
(The question just mentions an overall large result set).

I believe the optimizer is smart enough to convert between "in" vs "exists" when there is a significant cost difference due to (1) and (2), otherwise it may just be used as a hint (e.g. exists to encourage use of an a seekable index on the right side).

Both forms can be converted to join forms internally, have the join order reversed, and run as loop, hash or merge--based on the estimated row counts (left and right) and index existence in left, right, or both sides.

Solution 3:

I've done some testing on SQL Server 2005 and 2008, and on both the EXISTS and the IN come back with the exact same actual execution plan, as other have stated. The Optimizer is optimal. :)

Something to be aware of though, EXISTS, IN, and JOIN can sometimes return different results if you don't phrase your query just right: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Solution 4:

I'd go with EXISTS over IN, see below link:

SQL Server: JOIN vs IN vs EXISTS - the logical difference

There is a common misconception that IN behaves equally to EXISTS or JOIN in terms of returned results. This is simply not true.

IN: Returns true if a specified value matches any value in a subquery or a list.

Exists: Returns true if a subquery contains any rows.

Join: Joins 2 resultsets on the joining column.

Blog credit: https://stackoverflow.com/users/31345/mladen-prajdic