How to find and fix fragmented MySQL tables

the short answer:

select  ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free from information_schema.tables  where  DATA_FREE > 0;

The "You must know" answer

first at all you must understand that Mysql tables get fragmented when a row is updated, so it's a normal situation. When a table is created, lets say imported using a dump with data, all rows are stored with no fragmentation in many fixed size pages. When you update a variable length row, the page containing this row is divided in two or more pages to store the changes, and these new two (or more) pages contains blank spaces filling the unused space.

This does not impact performance, unless of course the fragmentation grows too much. What is too much fragmentation, well let's see the query you're looking for:

  select  ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free from information_schema.tables  where  DATA_FREE > 0;

The DATA_LENGTH and INDEX_LENGTH are the space your data and indexes are using, and DATA_FREE is the total amount of bytes unused in all the table pages (fragmentation).

Here's an example of a real production table

| ENGINE | TABLE_NAME               | data_length | index_length | data_free |
| InnoDB | comments                 |         896 |          316 |         5 |

In this case we have a Table using (896 + 316) = 1212 MB, and have data a free space of 5 MB. This means a "ratio of fragmentation" of:

5/1212 = 0.0041

...Which is a really low "fragmentation ratio".

I've been working with tables with a ratio near 0.2 (meaning 20% of blank spaces) and never notice a slow down on queries, even if I optimize the table, the performance is the same. But apply a optimize table on a 800MB table takes a lot of time and blocks the table for several minutes, which is impracticable on production.

So, if you consider what you win in performance and the time wasted in optimize a table, I prefer NOT OPTIMIZE.

If you think it's better for storage, see your ratio and see how much space can you save when optimize. It's usually not too much, so I prefer NOT OPTIMIZE.

And if you optimize, the next update will create blank spaces by splitting a page in two or more. But it's faster to update a fragmented table than a not fragmented one, because if the table is fragmented an update on a row not necessarily will split a page.

I hope this helps you.


Just to add to the answer from Felipe-Rojas you can calculate the fragment ratio as part of the query:

select ENGINE,
  concat(TABLE_SCHEMA, '.', TABLE_NAME) as table_name,
  round(DATA_LENGTH/1024/1024, 2) as data_length,
  round(INDEX_LENGTH/1024/1024, 2) as index_length,
  round(DATA_FREE/1024/1024, 2) as data_free,
  (data_free/(index_length+data_length)) as frag_ratio
FROM information_schema.tables
WHERE DATA_FREE > 0
ORDER BY frag_ratio DESC;

If a table is fragmented a small percentage (less than 5%?) then you can probably leave it alone.

Anything larger and you will need to assess based on your db usage, locking tables etc as to how important it is to defragment the table.