Modifying columns of very large mysql tables with little or no downtime

Solution 1:

I periodically need to make changes to tables in mysql 5.1, mostly adding columns.

Don't. No really. Just don't. It should be a very rare occasion when this is ever necessary.

Assuming your data really is normalized to start with, the right way to solve the problem is to add a new table with a 1:1 relationship to the base table (non-obligatory on the new table).

Having to add columns regularly is usually an indicator of a database which is not normalized - if your schema is not normalized then that's the problem you need to fix.

Finally, if your schema really, really is normalized and you really, really must keep adding columns then:

  1. Ensure you've got a timestamp column on the database or that it is generating replication logs
  2. Create a copy (B) of the table (A)
  3. add the new columns to B (this will still block with myisam)
  4. disable transactions
  5. rename the original table (A) as something else (backup)
  6. rename the new table (B) with the name of the original table (A)
  7. replay the transactions from the start of the operation from the replication log or from the backup table
  8. enable transactions.

Solution 2:

I just had to do this recently. What Amazon recommended was using the Percona Toolkit. I downloaded it and was able to run something like:

./pt-online-schema-change h=databasenameHostName,D=databasename,t=tablename --recursion-method=none --execute --user username --password password --alter "MODIFY someColumn newDataType"

and it works great. It tells you how much time remaining in the process.

It actually creates a new table with the new column and then copies the existing data over. Further, it creates a trigger so that new data is also pushed over to the new table. It then renames the tables automagically, drops the old table and you're up and running with the new column and no downtime while you waited for the updates.

Solution 3:

symcbean provides some solid recommendations.

To answer your question, the easiest and best way to mitigate impact is by having multiple databases replicating. Dual master with an appropriate failover procedure stopping replication on the active, which allows an alteration on the inactive without impacting active.

You could potentially do this on a single live database and minimize impact by using a procedure similar to the one I detailed in this answer. Admittedly, this is similar to what symcbean described but includes technical details. You could use an auto_increment field as well and not just timestamp.

Ultimately, if your data set is growing so large, you need to also consider archival between OLTP and OLAP databases. Your transaction dataset should not need to be so large, if you design appropriately.