How do I make an AWS RDS MySQL Read "replica" have a different schema?

Solution 1:

The problem is that the AWS RDS MySQL read-replica mechanism requires that the two schemas be exact.

Replication, in general, requires this.

But, if your definition of "close enough" is indeed close enough, this can be done in RDS for MySQL the same way you'd always do it.

create a DB instance ... and SET it to be the replica.

Well, that won't be the way you do it.

That isn't how you'd do it outside RDS, either. You always have to start with an identical replica, then change it, because replication by definition relies on a known, specific point in time where the data sets as well as schemata are identical, after which changes become possible to the extent that MySQL indeed considers them "close enough."

Examples of "close enough" might include a replica with more indexes or fewer ... but obviously not things like a new unique key or foreign key constraint on the replica where the data on the master violates the constraint ... and a collation change could cause an existing constraint to become violated without any actual data changing. (Or not, if you switched from case-insensitive to binary).

Adding tables to the replica is fine, dropping tables is not, and it is even possible to add columns or drop columns from tables, if and only if the remaining columns by ordinal position beginning at the first column are identical. That is, you can either add columns to or remove columns from the right-edge of the table; you can't change column order, but you can change data types if coercion is possible, such as increasing the length of a VARCHAR. You can rename columns as long as BINLOG_FORMAT on the master is set to ROW, which will often be the better choice when attempting such changes. In the case of RDS, the only alternative is MIXED. They wisely prevent strictly STATEMENT-based replication. Note that BINLOG_FORMAT on the replica is irrelevant to slave conversions.

MySQL generally tries to implicitly do type conversions during replication. See also Replication with Differing Table Definitions on Master and Slave for the general MySQL perspective on this.

You may indeed get away with what you are planning. If not, you should find out rather quickly.

But at least with RDS, it's easy enough to try again if you break it... and, the master's integrity and performance are unaffected by a broken replica.

Here's your fix:

Read Replicas are designed to support read queries, but you might need occasional updates, such as adding an index to speed the specific types of queries accessing the replica. You can enable updates by setting the read_only parameter to 0 in the DB parameter group for the Read Replica.

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.MySQL

Unless this is your only replica, be sure to copy the parameter group to a new one, apply that one to the replica, then change the new group's setting. Leaving other replicas writable as a side effect is a recipe for trouble.

Once this is done and takes effect, the replica is writable. If your proposed changes are indeed "close enough," then after you log in to the replica directly, using the same credentials you'd use for the master, and make your changes, it will keep replicating.

Or, if not, replication will break.

There is no need to pause replication when making schema changes to a replica, because MySQL automatically suspends the execution of replication events using normal locking mechanisms (like table metadata locks) while DDL operations are in progress on the replica involving objects to which the next replication event requires access.

The SHOW SLAVE STATUS statement works identically to standard MySQL, in RDS. If Slave_IO_Running and Slave_SQL_Running both show Yes and Seconds_Behind_Master is not NULL then the replica is not broken; if Seconds_Behind_Master = 0 then the replica is in real-time sync with the master (> 0 means it is lagging, trying to catch up).


Question: would a loooonngggg change (recollating the column will take on the order of an hour) cause problems on the master? I am thinking of massively backed up replication journals, and hour of traffic blocked on a lock in the read-only replica.

This will not be a problem with RDS, for two reasons.

The most significant reason is that MySQL replication uses two threads -- one to receive the logs from the master (the I/O thread) and one to execute them (the SQL thread) -- these are the threads whose "running" status I mentioned, above. When the replica is blocking the execution of events, due to local changes, it continues to receive them. As long as the replica doesn't run out of storage, everything continues as expected when the locks are released.

Additionally, though not terribly meaningful in this context, since the I/O thread will remain running, RDS doesn't use the standard expire_logs_days system variable to purge old binlogs from the master. Instead, it purges them itself -- as soon as none of your replicas will need them, but not before -- helpful, since they count against your storage allocation. (You can also configure RDS to leave them there longer, if desired). I've stopped RDS replication entirely for over 24 hours and started it back up with no issues.