How does a geographically distributed web app handle stored data?

Hypothetically... I log in to a web app from Australia and change some data. In the US my colleague is using the same system and wants to view the data I've changed. How could the web app be deployed locally for Australian users, and locally for US users (for performance) but share data?

How does Google, Facebook or any other global system improve performance for users in different countries but still keep data in-sync in case a user travels to a different location or the data is used globally. Or do they in reality have their database servers in one location?

As for Facebook, Google, etc: The database servers are not all in one location and certainly not all in total sync all the time. They all employ a distributed system over several clusters of servers for different geographical areas.

Clusters are distributed in many countries. Frequency of updates between clusters depends on the need of the system to acceptably work.

If you take Facebook e.g: Most of the time you communicate with friends in your own country. So keeping servers in your country will have an immediate effect and your friends will instantly see your messages.
Friends in other contries might have a delay, depending on how often the clustered server nodes are updated. IIRC Facebook clusters interact by requesting information from other clusters if needed. Many times have I gotten a message saying something like "This user updated status to blah blah". When clicking the link to the whole message I've gotten an error message. This is a syncronization problem between the clusters. Some information have been synchronized while other have not.

How you build the infrastructure depends on how many users, how often the data needs to be synchronized, etc.

Another example, Email: The Email system is a distributed system across the whole planet. A server with a single user is not that busy compared to a server with 1 million users. How would you solve the delivery issues for a busy server? More distributed local server? More powerful servers? More powerful internet connection? All of the above? Since the underlying concept of email (to deliver messages from one node to another) doesn't change regardless of the number of Email users, you'll need to design you particular system to accommodate for all your users. Regardless of how you design your system there are times when emails are delayed in delivery because there simply is too much traffic on the other nodes in the chain.

The same concept applies to Facebook. They design and build their farms for a specific region but the whole system relies on "geographical differences". That is, you are more likely to interact with users in your own region than other regions.

As for your particular problem: It all depends on how many users there are.
A single database server (or clustered server) might work for you. If there is need for distributed clustered server farms then you might have to write your own system for syncronization like Facebook and Google did. This solution depends on what your users need and how the system is intended to work. I don't know of any standardized system that is a "works for all" solution.

I've been ranting a lot here and it's quite late and I might be totally off target but hey, it's my 2 cents.


Not sure how constructive this is, however google claims to have near real time syncronicity. They even have their own atomic clocks at their data centers for proper synchronization. Wired has an article on it:

This is a well known CS problem, summarized as the CAP theorem by Eric Brewer.

However it seems that Google might have solved for it, with Google spanner which is now publicly available

If you aren't ready to use spanner, then you should consider the guiding principles of your data requirements. Consistency, Availability or Performance. (CAP)

there are lots of articles and design patterns for this so i won't recap it here.