Condition within JOIN or WHERE
The relational algebra allows interchangeability of the predicates in the WHERE
clause and the INNER JOIN
, so even INNER JOIN
queries with WHERE
clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN
process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN
relatively "incomplete" and putting some of the criteria in the WHERE
simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN
sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.