MySQL indexes - what are the best practices?

You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.

Broadly speaking, an index imposes an ordering on the rows of a table.

For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.

Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.

Now imagine that you need to find all the rows that have some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!

Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.

Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!

Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.

So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.

On the other hand, reads become a lot faster.

Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).

Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.

Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentially, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)

Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.


Check out presentations like More Mastering the Art of Indexing.

Update 12/2012: I have posted a new presentation of mine: How to Design Indexes, Really. I presented this in October 2012 at ZendCon in Santa Clara, and in December 2012 at Percona Live London.

Designing the best indexes is a process that has to match the queries you run in your app.

It's hard to recommend any general-purpose rules about which columns are best to index, or whether you should index all columns, no columns, which indexes should span multiple columns, etc. It depends on the queries you need to run.

Yes, there is some overhead so you shouldn't create indexes needlessly. But you should create the indexes that give benefit to the queries you need to run quickly. The overhead of an index is usually far outweighed by its benefit.

For a column that is VARCHAR(2500), you probably want to use a FULLTEXT index or a prefix index:

CREATE INDEX i ON SomeTable(longVarchar(100));

Note that a conventional index can't help if you're searching for words that may be in the middle of that long varchar. For that, use a fulltext index.