How can I centralise MySQL data between 3 or more geographically separate servers?

To explain the background to the question:

We have a home-grown PHP application (for running online language-learning courses) running on a Linux server and using MySQL on localhost for saving user data (e.g. results of tests taken, marks of submitted work, time spent on different pages in the courses, etc).

As we have students from different geographic locations we currently have 3 virtual servers hosted close to those locations (Spain, UK and Hong Kong) and users are added to the server closest to them (they access via different URLs, e.g. europe.domain.com, uk.domain.com and asia.domain.com). This works but is an administrative nightmare as we have to remember which server a particular user is on, and users can only connect to one server. We would like to somehow centralise the information so that all users are visible on any of the servers and users could connect to any of the 3 servers.

The question is, what method should we use to implement this. It must be an issue that that lots of people have encountered but I haven't found anything conclusive after a fair bit of Googling around. The closest I have seen to solutions are:

  • something like master-master replication, but I have read so many posts suggesting that this is not a good idea as things like auto_increment fields can break.

  • circular replication, this sounded perfect but to quote from O'Reilly's High Performance MySQL, "In general, rings are brittle and best avoided"

We're not against rewriting code in the application to make it work with whatever solution is required but I am not sure if replication is the correct thing to use.

Thanks,

Andy

P.S. I should add that we experimented with writes to a central database and then using reads from a local database but the response time between the different servers for writing was pretty bad and it's also important that written data is available immediately for reading so if replication is too slow this could cause out-of-date data to be returned.

Edit: I have been thinking about writing my own rudimentary replication script which would involve something like having each user given a server ID to say which is his "home server", e.g. users in asia would be marked as having the Hong Kong server as their own server. Then the replication scripts (which would be a PHP script set to run as a cron job reasonably frequently, e.g. every 15 minutes or so) would run independently on each of the servers in the system. They would go through the database and distribute any information about users with the "home server" set to the server that the script is running on to all of the other databases in the system. They would also need to suck new information which has been added to any of the other databases on the system where the "home server" flag is the server where the script is running. I would need to work out the details and build in the logic to deal with conflicts but I think it would be possible, however I wanted to make sure that there is not a correct solution for this already out there as it seems like it must be a problem that many people have already come across.


Solution 1:

Circular replication, which is simply master/master/master in your case will work. Although some claim this can at times be slightly "brittle", so is any multi-server setup. A decent monitoring system, which should be part of your overall package anyway, will help to detect any problems early, so they can be resolved before becoming a serious problem.

The auto-increment "problem" is so easily circumvented that I can't believe people are still raising it as an issue. Simple start the counter on each server at a different number and use a step at least as large as the number of servers, leaving enough space for any possibly future additions.

Solution 2:

For your application, it sounds like circular replication (of which multi-master is a special case) shouldn't be too big of an issue.

The auto_increment issue is easily addressed via auto_increment_increment and auto_increment_offset.

Monitor replication on all isntances with a relatively high frequency, and fix the sources of anything that causes replication to break or your data to drift. Maatkit's mk-table-checksum and mk-table-sync are good for identifying the drifting data. Gotta hit the binary logs and code to identify the sources... :)