MySQL slave replication reset with no Master Downtime (using MyISAM)
Root of the issue:
In all instructions for creating a slave on a running master requires flush tables with read lock
. We use MyISAM, so we can't just use --single transaction to get consistent table data.
A slave fails for all sorts of "normal" reasons - on average about once a week. So I bring down the site, flush table with read lock the master db, mysqldump (single transaction, with master record), push to slave, reset the master (with log positions), and start slave, etc.
To do this without master downtime, I have tried basically the same steps, then use START SLAVE UNTIL
- then pausing the live db for a few seconds while I run SELECT MASTER_POS_WAIT(); on the slave. I can't then seem to get the slave to catch back up.
What is missing (or unnecessary) from the steps below to have the slave DB catch up to the master again? Would a simple start slave
work?
#!/bin/bash
##
mysqldump --allow-keywords --add-drop-table --comments --extended-insert --master-data \
--debug-info --single-transaction -u $LOCALDB_USER_NAME -p$LOCALDB_PASS $LOCALDB_NAME > $DBFILENAME
## get master position from file for use later
echo
echo "############# MASTER STATUS #############"
cat $DBFILENAME | grep "CHANGE MASTER"
echo
echo "compressing"
gzip $DBFILENAME
echo "sending to $REMOTE_SERVER"
[...]
echo "uncompresing remote db"
sudo ssh $REMOTE_SERVER_USERNAME@$REMOTE_SERVER "cd /tmp && gunzip /tmp/$COMPRESSED_DBFILENAME "
echo "loading external db"
sudo ssh $REMOTE_SERVER_USERNAME@$REMOTE_SERVER "mysql -u $REMOTEDB_USER_NAME -p$REMOTEDB_PASS $REMOTEDB_NAME -e \"STOP SLAVE;\" "
sudo ssh $REMOTE_SERVER_USERNAME@$REMOTE_SERVER "mysql -u $REMOTEDB_USER_NAME -p$REMOTEDB_PASS $REMOTEDB_NAME -e \"RESET SLAVE;\" "
sudo ssh $REMOTE_SERVER_USERNAME@$REMOTE_SERVER "mysql -u $REMOTEDB_USER_NAME -p$REMOTEDB_PASS $REMOTEDB_NAME -e \"FLUSH LOGS;\" "
sudo ssh $REMOTE_SERVER_USERNAME@$REMOTE_SERVER "mysql -u $REMOTEDB_USER_NAME -p$REMOTEDB_PASS $REMOTEDB_NAME < /tmp/$DBFILENAME"
echo "remote import completed"
# CHANGE MASTER TO MASTER_HOST=' ', MASTER_USER='', MASTER_PASSWORD='', MASTER_LOG_FILE='mysql-bin.042025', MASTER_LOG_POS=73160995;
# START SLAVE UNTIL MASTER_LOG_FILE='mysql-bin.042025', MASTER_LOG_POS=73160995;
## on master
## FLUSH TABLES WITH READ LOCK;
## SHOW MASTER STATUS;
## select from above
## on slave:
## SELECT MASTER_POS_WAIT('mysql-bin.042136', 165900463);
## on master
## UNLOCK TABLES;
Solution 1:
There are two alternatives
ALTERNATIVE #1 : Use XtraBackup
It is capable of copying MyISAM as well as InnoDB on a running master.
ALTERNATIVE #2 : Run rsync multiple times
You could run rsync against /var/lib/mysql on a master and copy it to /var/lib/mysql on a slave. Of course, I would run rsync several times. Until the final rsync you should run the FLUSH TABLES WITH READ LOCK. Before copying make sure you hose all binary logs and start from scratch.
Before running anything, please make sure binary logs are written in /var/lib/mysql on both master and slave by having something like the following in /etc/my.cnf:
[mysqld]
log-bin=mysql-bin
Please try running this script in the event you do not want to shutdown MySQL on the master:
mysql -u... -p... -e"SET GLOBAL innodb_max_dirty_pages_pct = 0; RESET MASTER;"
RSYNCSTOTRY=7
cd /var/lib/mysql
X=0
while [ ${X} -lt ${RSYNCSTOTRY} ]
do
X=`echo ${X}+1|bc`
rsync -r * slaveserver:/var/lib/mysql/.
sleep 60
done
mysql -u... -p... -e"FLUSH TABLES WITH READ LOCK; SELECT SLEEP(86400);"
sleep 60
SLEEPID=`mysql -u... -p... -e"SHOW PROCESSLIST;" | grep "SELECT SLEEP(86400)" | awk '{print $1}'`
rsync -r * slaveserver:/var/lib/mysql/.
mysql -u... -p... -e"KILL ${SLEEPID};"
I am a little more conservative in terms of data and index pages being cached while doing this. Personally, I prefer to shutdown mysql after several rsyncs instead of the FLUSH TABLES WITH READ LOCK. Another alternative to this script would be the following script which shuts down mysql for the final rsync:
mysql -u... -p... -e"RESET MASTER;"
RSYNCSTOTRY=7
cd /var/lib/mysql
X=0
while [ ${X} -lt ${RSYNCSTOTRY} ]
do
X=`echo ${X}+1|bc`
rsync -r * slaveserver:/var/lib/mysql/.
sleep 60
done
service mysql stop
rsync -r * slaveserver:/var/lib/mysql/.
service mysql start
That's all for the rsync portion from the master. What about the Slave ???
Before you start mysql on the slave, you need to have the log file and log position from the master. The binary logs you copied have you need, particularly the last binary log. Here is how you get it on the slave:
cd /var/lib/mysql
for X in `ls -l mysql-bin.0* | awk '{print $9}'`
do
LOGFIL=${X}
done
LOGPOS=`ls -l ${LOGFIL} | awk '{print $5}'`
echo "Master Log File ${LOGFIL} Position ${LOGPOS}"
You can trust these numbers because you copied them personally from the master. Now that you have the master log and position, you can start up mysql on the slave and setup replication using the log file and log position that was just reported.
Give it a try !!!
CAVEAT
If you have any InnoDB data, you should set this about 1 hour before attempt to rsync:
SET GLOBAL innodb_max_dirty_pages_pct = 0;
This will cause InnoDB to page out uncommitted data from the InnoDB Buffer Pool faster.
Solution 2:
If you have your datadir on a filesystem that allows snapshotting (such as ZFS OR LVM) then you can create a snapshot while MySQL is "down", grab the master info and then unlock the tables. The snapshot usually only takes a few seconds.