Making a very reliable NFS server that runs nothing but NFS is paramount. Running a dual head/single shelf system can give you some redundancy there, but, once you try to go dual head/dual shelf, you run into quite a few problems. While you can run master/slave with dual head/dual shelf, dual primary requires a clustered filesystem like OCFS or GFS. So, step one, make a very reliable, very simple NFS server.

You can use heartbeat and run your webservers on the edge, and in the event that one fails, the other poisons your arp and starts answering the requests. However, this becomes a little unwieldy and eventually you will want to put a load balancer (or pair) in front of your webservers and use IPVS/LVS to direct requests to your backends. With ipvs, you will want to set up connection affinity to make sure that users usually hit the same server which makes anything that uses php sessions happy. You can also share the sessions directory on the NFS server which would allow both machines to access the same session files.

MySQL becomes fun. It requires three nodes to run clustering, though, with a little trickery, you can do it with two nodes. MySQL doesn't play well with NFS filesystems, so, you will want to run a master/slave setup with heartbeat to failover, or, set up a pair of mysql servers and run them in a clustered setup.

Your read/write load will be the real determining factor. You'll want to run a gigE LAN connection to the NFS servers, and a separate connection to the Internet. If you are running a heavy read load, you won't have too many issues. If all of your servers are running a heavy write load, you'll run into quite a few other things. If possible, use Jumbo frames on your LAN if you have large files that you typically serve. Since the files are on the NFS server, if you use sendfile with apache, and the file is later deleted, the next time you access the file you'll get a bus error.

If you wanted to get tricky, and you were doing a high read volume, you could consider using AFS which does small local caches from the shared storage. That way, if the webserver disconnects from the storage node, it could still serve most of the static content. Alternatively, using Varnish/Squid or some other caching system will considerably cut down on LAN bandwidth. Then, you have the issue of having one or two Varnish servers that have their own cache, duplicating content in memory. That could be a worthwhile tradeoff to reduce LAN traffic.

Once you get the framework set up, going from two webservers to three is just a matter of adding a new server. It is the step from one to two that usually is quite a leap. Some software behaves differently when running on a cluster as opposed to running on a single machine.


That sounds like a reasonable plan to me. Are you just looking to scale or eliminate single points of failure? If the latter, your shared storage could present a problem as well (and add additional latency if the reads are not cached locally by the Apache servers). If your content is mostly static, there are ways to have an authoritative content store and then replicate it out to your web nodes periodically so they each have a local copy. This is more complex, but allows your centralized storage to fail as well without taking all of the web nodes with it.