EXISTS vs JOIN and use of EXISTS clause
Solution 1:
EXISTS
is used to return a boolean value, JOIN
returns a whole other table
EXISTS
is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN
is used to extend a result set by combining it with additional fields from another table to which there is a relation.
In your example, the queries are semantically equivalent.
In general, use EXISTS
when:
- You don't need to return data from the related table
- You have dupes in the related table (
JOIN
can cause duplicate rows if values are repeated) - You want to check existence (use instead of
LEFT OUTER JOIN...NULL
condition)
If you have proper indexes, most of the time the EXISTS
will perform identically to the JOIN
. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS
.
If your JOIN
key is not indexed, it may be quicker to use EXISTS
but you will need to test for your specific circumstance.
JOIN
syntax is easier to read and clearer normally as well.
Solution 2:
- EXISTS is a semi-join
- JOIN is a join
So with 3 rows and 5 rows matching
- JOIN gives 15 rows
- EXISTS gives 3 rows
The result is the "short circuit" effect mentioned by others and no need to use DISTINCT with a JOIN. EXISTS is almost always quicker when looking for existence of rows on the n side of a 1:n relationship.
Solution 3:
EXISTS
is primarily used to shortcut. Essentially the optimizer will bail out as soon as the condition is true, so it may not need to scan the entire table (in modern versions of SQL Server this optimization can occur for IN()
as well, though this was not always true). This behavior can vary from query to query, and in some cases the join may actually give the optimizer more opportunity to do its job. So I think it's hard to say "this is when you should use EXISTS
, and this is when you shouldn't" because, like a lot of things, "it depends."
That said, in this case, since you have essentially a 1:1 match between the tables, you are unlikely to see any performance difference and the optimizer will likely produce a similar or even identical plan. You may see something different if you compare join/exists on the sales table when you add 50,000 rows for each title (never mind that you will need to change your join query to remove duplicates, aggregate, what have you).