MySQL replication issues after a power outage

I went for the first option.

That worked up to the point where the slave started trying to do inserts that conflicted with the primary keys. The slave had done more work than the master bin-log had persisted, as previously mentioned. One aspect I didn't anticipate was that the slave contained data that wasn't in the master; i.e. the slave persisted some transactions before the power outage that the master HAD NOT persisted.

Since for my case, these transactions weren't payment-related or similar, I chose to delete the data from the slave (thereby losing some data that had happened, but which didn't exist in the master) and then let replication run again. This brought the slaves completely up to date. If the data had been more important, we have auto-increment offsets sufficient to give us some wriggle room for manually wrangling the data and ensuring referential integrity was not compromised. Thankfully we didn't need to do that in this case.

For a machine in (passive) master-master configuration which was in this predicament, I chose a similar approach. By passive master-master, I mean that we have an active master (serverA) which is where all the writes go, and a passive master (serverB) that is in place to allow for schema updates to take place with zero downtime. The data in active master (serverA) was chosen as the true values, despite knowing that this meant we lost a couple of persisted transactions which weren't deemed important.

  • Changed the log file and position on the slave.

    CHANGE MASTER MASTER_LOG_FILE='bin-log.004713', MASTER_LOG_POS=0; -- on serverB
    
  • Restarted slave replication on the passive master (serverB) until it failed with primary key constraint violations, as with the other slaves.

     START SLAVE; -- on serverB
    
  • Stopped slave replication from the passive master (serverB) to the active master (serverA).

    STOP SLAVE; -- on serverA
    
  • DELETE the rows on the slave (serverB) which didn't exist in the master on serverA.

    DELETE FROM SOME_TABLE WHERE ID IN (???,????); -- on serverB
    SHOW MASTER STATUS\G; -- get the new master log position on serverB
    
  • Move the active master (serverA) slave exec position to skip those deletes from the passive master (serverB).

    CHANGE MASTER TO MASTER_LOG_POS=???; --on serverA; use the value just obtained from serverB
    
  • Restart replication on both the active master (serverA) and passive master.

    START SLAVE; -- on both machines. serverA does nothing and serverB starts catching up.