Is there any significant benefit in adding multi-column indices in MySQL over having single column indices?

Being pretty new to database optimization in systems with high load I have the following question - lets assume we have the following query (query is with sample data):

SELECT * 
FROM ticket 
WHERE ticket_status='draft' 
AND user_id='789437879' 
ORDER BY ticket_id DESC  LIMIT 0, 15

We already have the following indizes:

CREATE INDEX ticket_status on ticket(ticket_status);
CREATE INDEX user_id on ticket(user_id);
CREATE INDEX ticket_id on ticket(ticket_id);

Would there be a significant performance benefit for optimizing this query if we do the following:

CREATE INDEX make_that_query_more_efficient on ticket(user_id,ticket_status);

Or does it make barely any difference as all the columns are indexed anyways?


Solution 1:

It depends on the query. But definitely in your example.

In many queries, it makes a huge difference.

Here is one discussion on such: http://mysql.rjweb.org/doc.php/index1 . It discusses various indexing strategies on a simple query. The conclusion is that a composite index is clearly optimal for the query.

Meanwhile, there are dozens, maybe hundreds, of examples in this forum where my solution to a performance problem (high CPU, slow query, etc) is primarily to replace a single-column index with a multi-column index. The performance gain is sometimes spectacular when WHERE, ORDER BY, and LIMIT can all be handled by the INDEX.

A many-to-many mapping table ('junction') is a very common schema pattern that novices fail to index properly. More on it: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

For your query, this would make a world of difference:

INDEX(user_id, ticket_status,  -- these two can be in either order
      ticket_id)               -- this needs to be last

The execution will quickly drill into the Index's BTree to the row(s) with ticket_status='draft' AND user_id='789437879'. It will start at the end of such items and scan backward (DESC), picking up 15 (or fewer) items. Then it will look up the other columns (*) and deliver them.

Almost any other index would require scanning more than just 15 items.

As for your indexes.

  • If ticket_id is already the PRIMARY KEY, do not add INDEX(ticket_id); it will be useless.

  • If any of your indexes is a prefix of the index I recommend, DROP it; it will be redundant and in the way. (Use EXPLAIN SELECT to see this.)

  • If the Optimizer picked your (ticket_status), it would look at all the entries with desired status, filter based on user_id, sort the results, then peel off 15 rows.

  • Similarly for (user_id)

  • If the Optimizer were to use INDEX(ticket_id), it would start at the end of the ids and work backward. If there were not 15 relevant rows, it would not stop until it scanned the entire table.

  • Note that my composite index even avoids the sort.

  • The rest of the indexes may or may not be useful; it depends on whether other queries can use them.

  • A suitable index is likely to be much more beneficial to a SELECT than a burden on INSERTs; so don't worry about this tradeoff.

  • Starting with user_id may be useful to other queries; starting with status seems less likely. The single-column INDEX(ticket_status) is unlikely to be used by any query due to "cardinality".

  • My 3-column index is likely to be much better than your similar 2-column index. Mine takes care of the ORDER BY and the LIMIT; yours needs to gather lots of rows and sort them.

  • If there are big TEXT columns in the *, the performance difference may be more dramatic.