How is server replication implemented when setting up a reverse proxy environment?

I've been reading and having a go at setting up some different server cluster environments. I have it working correctly using various Linux distributions.

The basic set-up goes as follows:

                -----  App Server 1
                |                        Database Server 1
Load Balancer --|                        Database Server 2
                |
                -----  App Server 2

As you can see, it's a basic slave-master set-up. My problem I'm not sure of the best way to replicate the App server part. So if new script files or changes are made to app server 1, how are these changes cascaded down to the second app server?

The database servers run MySQL so any updates to one cascade to the others automatically.


Solution 1:

The TL;DR version of this is: it depends on the app

As a good reference on how modern software should be built, I highly recommend reading the twelve factor app.

Scripts and code are not replicated as such, but "released" to app servers (you may also prime the servers by releasing to them all and switching from old code to new with another mechanism, such as a symlink).

However, clients frequently have "state" which the app servers may need to remember if they are to serve requests correctly.

Generally speaking, for the app servers to be horizontally scalable, you want them to be as "stateless" as possible. This means that there is nothing for the app server to remember as such, it is all stored externally, and any request can be fulfilled with nothing but the data received in the request. However, your clients (web browsers) often require state, such as being logged in, so you need to be able to deal with state data somehow.

One workaround is to use "sticky sessions" which are implemented on the load balancer. Usually, this is implemented by injecting a cookie which the load balancer uses to remember which server it should send the request to. This is easy to implement and requires no code changes, but has the rather large drawback that state will be lost whenever you restart an app server.

Another way is to make the app stateless by storing everything in cookies. I don't recommend this at all, because the state data will be uploaded on every request. Clients could also modify this data and do unexpected things, and there could be huge privacy implications should anyone capture the cookie.

Neither of these solve the files-on-disk problem because they rely on cookies, which the user may lose, or they could simply login from somewhere else.

The final, and recommended way, is to store client state in a database (or key:value store) such that any app instance can query it when a request is received. You still need the client to remember a "session id", but it is more secure as less data is stored by the client. Traditionally a cluster of memcached nodes with consistent hashing would perform this role, but these days there are more powerful solutions such as Redis.

With regards to files on disk - don't store anything on the app server disks, except application code and static assets which can be "released" as part of a deployment. Logs are a possible exception, but pushing them to a remote syslog server or something like logstash is always preferable.

If your application must store files (such as those uploaded by clients), store them in a datastore of some kind (Cassandra isn't a bad choice), a storage service of some kind such as Openstack's swift, or Amazon S3 if you aren't large enough to justify doing it yourself.

If the app is small scale and you handle file locking properly, an NFS or samba share can also work, but I would recommend Amazon S3 before going down this route.