SQL JOIN - WHERE clause vs. ON clause
After reading it, this is not a duplicate of Explicit vs Implicit SQL Joins. The answer may be related (or even the same) but the question is different.
What is the difference and what should go in each?
If I understand the theory correctly, the query optimizer should be able to use both interchangeably.
They are not the same thing.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345
The first will return an order and its lines, if any, for order number 12345
. The second will return all orders, but only order 12345
will have any lines associated with it.
With an INNER JOIN
, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
-
Does not matter for inner joins
-
Matters for outer joins
a.
WHERE
clause: After joining. Records will be filtered after join has taken place.b.
ON
clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result (since OUTER join).
Example: Consider the below tables:
-
documents:
id name 1 Document1 2 Document2 3 Document3 4 Document4 5 Document5 -
downloads:
id document_id username 1 1 sandeep 2 1 simi 3 2 sandeep 4 2 reya 5 3 simi
a) Inside WHERE
clause:
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
WHERE username = 'sandeep'
For above query the intermediate join table will look like this.
id(from documents) | name | id (from downloads) | document_id | username |
---|---|---|---|---|
1 | Document1 | 1 | 1 | sandeep |
1 | Document1 | 2 | 1 | simi |
2 | Document2 | 3 | 2 | sandeep |
2 | Document2 | 4 | 2 | reya |
3 | Document3 | 5 | 3 | simi |
4 | Document4 | NULL | NULL | NULL |
5 | Document5 | NULL | NULL | NULL |
After applying the WHERE
clause and selecting the listed attributes, the result will be:
name | id |
---|---|
Document1 | 1 |
Document2 | 3 |
b) Inside JOIN
clause
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
AND username = 'sandeep'
For above query the intermediate join table will look like this.
id(from documents) | name | id (from downloads) | document_id | username |
---|---|---|---|---|
1 | Document1 | 1 | 1 | sandeep |
2 | Document2 | 3 | 2 | sandeep |
3 | Document3 | NULL | NULL | NULL |
4 | Document4 | NULL | NULL | NULL |
5 | Document5 | NULL | NULL | NULL |
Notice how the rows in documents
that did not match both the conditions are populated with NULL
values.
After Selecting the listed attributes, the result will be:
name | id |
---|---|
Document1 | 1 |
Document2 | 3 |
Document3 | NULL |
Document4 | NULL |
Document5 | NULL |
On INNER JOIN
s they are interchangeable, and the optimizer will rearrange them at will.
On OUTER JOIN
s, they are not necessarily interchangeable, depending on which side of the join they depend on.
I put them in either place depending on the readability.
The way I do it is:
Always put the join conditions in the
ON
clause if you are doing anINNER JOIN
. So, do not add any WHERE conditions to the ON clause, put them in theWHERE
clause.-
If you are doing a
LEFT JOIN
, add any WHERE conditions to theON
clause for the table in the right side of the join. This is a must, because adding a WHERE clause that references the right side of the join will convert the join to an INNER JOIN.The exception is when you are looking for the records that are not in a particular table. You would add the reference to a unique identifier (that is not ever NULL) in the RIGHT JOIN table to the WHERE clause this way:
WHERE t2.idfield IS NULL
. So, the only time you should reference a table on the right side of the join is to find those records which are not in the table.