Is there any significant benefit in adding multi-column indices in MySQL over having single column indices?
Being pretty new to database optimization in systems with high load I have the following question - lets assume we have the following query (query is with sample data):
SELECT *
FROM ticket
WHERE ticket_status='draft'
AND user_id='789437879'
ORDER BY ticket_id DESC LIMIT 0, 15
We already have the following indizes:
CREATE INDEX ticket_status on ticket(ticket_status);
CREATE INDEX user_id on ticket(user_id);
CREATE INDEX ticket_id on ticket(ticket_id);
Would there be a significant performance benefit for optimizing this query if we do the following:
CREATE INDEX make_that_query_more_efficient on ticket(user_id,ticket_status);
Or does it make barely any difference as all the columns are indexed anyways?
Solution 1:
It depends on the query. But definitely in your example.
In many queries, it makes a huge difference.
Here is one discussion on such: http://mysql.rjweb.org/doc.php/index1 . It discusses various indexing strategies on a simple query. The conclusion is that a composite index is clearly optimal for the query.
Meanwhile, there are dozens, maybe hundreds, of examples in this forum where my solution to a performance problem (high CPU, slow query, etc) is primarily to replace a single-column index with a multi-column index. The performance gain is sometimes spectacular when WHERE
, ORDER BY
, and LIMIT
can all be handled by the INDEX
.
A many-to-many mapping table ('junction') is a very common schema pattern that novices fail to index properly. More on it: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
For your query, this would make a world of difference:
INDEX(user_id, ticket_status, -- these two can be in either order
ticket_id) -- this needs to be last
The execution will quickly drill into the Index's BTree to the row(s) with ticket_status='draft' AND user_id='789437879'
. It will start at the end of such items and scan backward (DESC
), picking up 15 (or fewer) items. Then it will look up the other columns (*
) and deliver them.
Almost any other index would require scanning more than just 15 items.
As for your indexes.
-
If
ticket_id
is already thePRIMARY KEY
, do not addINDEX(ticket_id)
; it will be useless. -
If any of your indexes is a prefix of the index I recommend, DROP it; it will be redundant and in the way. (Use
EXPLAIN SELECT
to see this.) -
If the Optimizer picked your
(ticket_status)
, it would look at all the entries with desired status, filter based onuser_id
, sort the results, then peel off 15 rows. -
Similarly for
(user_id)
-
If the Optimizer were to use
INDEX(ticket_id)
, it would start at the end of the ids and work backward. If there were not 15 relevant rows, it would not stop until it scanned the entire table. -
Note that my composite index even avoids the sort.
-
The rest of the indexes may or may not be useful; it depends on whether other queries can use them.
-
A suitable index is likely to be much more beneficial to a
SELECT
than a burden onINSERTs
; so don't worry about this tradeoff. -
Starting with
user_id
may be useful to other queries; starting withstatus
seems less likely. The single-columnINDEX(ticket_status)
is unlikely to be used by any query due to "cardinality". -
My 3-column index is likely to be much better than your similar 2-column index. Mine takes care of the
ORDER BY
and theLIMIT
; yours needs to gather lots of rows and sort them. -
If there are big
TEXT
columns in the*
, the performance difference may be more dramatic.