When to use STRAIGHT_JOIN with MySQL
I just had a fairly complex query I was working with and it was taking 8 seconds to run. EXPLAIN was showing a weird table order and my indexes were not all being used even with the FORCE INDEX hint. I came across the STRAIGHT_JOIN join keyword and started replacing some of my INNER JOIN keywords with it. I noticed considerable speed improvement. Eventually I just replaced all my INNER JOIN keywords with STRAIGHT_JOIN for this query and it now runs in .01 seconds.
My question is when do you use STRAIGHT_JOIN and when do you use INNER JOIN? Is there any reason to not use STRAIGHT_JOIN if you are writing good queries?
I wouldn't recommend using STRAIGHT_JOIN without a good reason. My own experience is that the MySQL query optimizer chooses a poor query plan more often than I'd like, but not often enough that you should just bypass it in general, which is what you would be doing if you always used STRAIGHT_JOIN.
My recommendation is to leave all queries as regular JOINs. If you discover that one query is using a sub-optimal query plan, I would suggest first trying to rewrite or re-structure the query a bit to see if the optimizer will then pick a better query plan. Also, for innodb at least, make sure it's not just that your index statistics are out-of-date (ANALYZE TABLE). That can cause the optimizer to choose a poor query plan. Optimizer hints should generally be your last resort.
Another reason not to use query hints is that your data distribution may change over time, or your index selectivity may change, etc. as your table grows. Your query hints that are optimal now, may become sub-optimal over time. But the optimizer will be unable to adapt the query plan because of your now outdated hints. You stay more flexible if you allow the optimizer to make the decisions.
Here's a scenario that came up just recently at work.
Consider three tables, A, B, C.
A has 3,000 rows; B has 300,000,000 rows; and C has 2,000 rows.
Foreign keys are defined: B(a_id), B(c_id).
Suppose you had a query that looks like this:
select a.id, c.id
from a
join b on b.a_id = a.id
join c on c.id = b.c_id
In my experience, MySQL may choose to go C -> B -> A in this case. C is smaller than A and B is enormous, and they're all equijoins.
The trouble is MySQL doesn't necessarily take into account the size of the intersection between (C.id and B.c_id) vs (A.id and B.a_id). If the join between B and C returns just as many rows as B, then it's a very poor choice; if starting with A would have filtered down B to as many rows as A, then it would have been a much better choice. straight_join
could be used to force this order like this:
select a.id, c.id
from a
straight_join b on b.a_id = a.id
join c on c.id = b.c_id
Now a
must be joined on before b
.
Generally you want to do your joins in an order that minimizes the number of rows in the resulting set. So starting with a small table and joining such that the resulting join will also be small, is ideal. Things go pear-shaped if starting with a small table and joining it to a bigger table ends up just as large as the big table.
It's stats dependent though. If the data distribution changes, the calculation may change. It's also dependent on the implementation details of the join mechanism.
The worst cases that I've seen for MySQL that all but required straight_join
or aggressive index hinting are queries that paginate over a lot of data in a strict sort order with light filtering. MySQL strongly prefers to use indexes for any filters and joins over sorts; this makes sense because most people aren't trying to sort the whole database but rather have a limited subset of rows that are responsive to the query, and sorting a limited subset is much faster than filtering the whole table, no matter whether it's sorted or not. In this case, putting straight join immediately after the table that had the indexed column I wanted to sort on fixed things.
From MySQL JOIN reference:
"STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order."
MySQL isn’t necessarilly good at choosing the join order in complex queries. By specifying a complex query as a straight_join the query executes the joins in the order they’re specified. By placing the table to be the least common denominator first and specifying straight_join you are able to improve the query performance.
STRAIGHT_JOIN
, using this clause, you can control the JOIN
order: which table is scanned in the outer loop and which one is in the inner loop.