Testing MySQL Database Replication / Synchronization

Solution 1:

The big problem with replication is checking

  1. that the nodes are all up,
  2. all nodes are communicating (not split brain)
  3. and processing replication logs
  4. and the replication lag

1, 3 and 4 can be captured usingf SHOW MASTER STATUS / SHOW SLAVE STATUS on the relevant nodes, although the replication delay only has 1 second accuracy and only across each hop. The Percona toolkit has scripts for getting more accurate replication lags.

Using multi-master replication (e.g. tungsten, Percona) saves a lot of pain but needs additional effort / software to set up.

If the network between ndoes fails, then the processes can all be running be fine - but will be unable to transfer data - you need monitoring on each node to check it can contact the upstream node.

a MySQL Master database which replicates to a few Slave databases

Best practice would be to designate one of the slaves as a master too - bi directional replication. That way you can easily switch over in the event of an outage, or for maintenance tasks such as rebuilding indexes, backups, schema changes.

Depending on the number of slave nodes, you might also want to designate a fanout node to propogate the changes.

In terms of managing escalations, scheduling scripts to collect data etc, there are lots of tools available which do this - I use nagios, so do lots of other people.

Solution 2:

My question is what are the best practices to monitor and / or test that the slave databases are up-to-date,

For a simple testing, insert/update data on the master and make sure that it is replicated to the slaves.

But for the consistency check, pt-table-checksum is what you're looking for.

For e.g:

pt-table-checksum localhost --empty-replicate-table --databases db --nocheck-replication-filters --replicate percona.checksums > /var/log/pt-table-checksum.log 2>&1

and that would alert an admin when there is an error causing replication to stop.

If you're using Nagios, check_mysql_health plugin can help to monitor the slave status (running or not). But to monitor the consistency, take a look at pmp-check-pt-table-checksum plugin.

Dont' miss the pt-table-sync if you have any inconsistent:

pt-table-sync -v --print --sync-to-master h=localhost,D=db,t=table
pt-table-sync -v --execute --sync-to-master h=localhost,D=db,t=table

Keep in mind that you probably should use the --print option first.

Solution 3:

on slave do

SHOW SLAVE STATUS\G;

If you're getting these:

   Slave_IO_Running: Yes
  Slave_SQL_Running: Yes

that means you almost there, to test it try to do any write transactions on MASTER and make sure same they replicate automatically on slave