Apache Failover and Load Balancing
I am working for a small finance firm as a Web Application Developer. Our company has an interal website coded in PHP and running on Apache. Recently our server went down and the website was down for several days causing severe problems.
I have been asked to setup two more servers for serving the website. So we want three Apache Web/App servers running on three different machines. When a user logs into the website, he must be servered by one of the three servers depending on the load. Also if one or two of the servers go down, the server which is up must handle the website requests.
I just know creating a website in PHP and hosting it on an Apache server. I dont have any knowledge of networking. Please tell me what I need to learn for creating the above mentioned system.
I am not expecting to be spoon fed. Just need a pointer to what I have to learn to achieve my goal. I'm Googling simultaneously but have asked this question here as I am in hurry to implement it.
A common approach is to develop web-applications that are aware of clustering. You'll probably need to remake your site's basics that's related to database, sessions, shared and dynamic data. I'm sure my question will make you interested: Cloud/cluster solutions for scalable web-services. To make a website 'scalable' you'll need to create a scalable design. Alas, there's no button with "Make it faster" written below :)
A simple way is to replicate all data between these servers (you may use GlusterFS for files, and replicate your MySQL/whatever between these servers) and ensure all sessions are available from all of them! It's not the best suggestion, but you won't have to remake your code :)
Load-balancing can be easily implemented with Round-Robin DNS: just add several 'A' records that point to different servers and they'll be picked randomly by clients. For instance, Google has this feature:
$ host -t a google.com google.com has address 74.125.87.147 google.com has address 74.125.87.103 google.com has address 74.125.87.104 google.com has address 74.125.87.99 google.com has address 74.125.87.105
While the easiest method of load balancing is something like round-robin DNS as indicated in o_O Tync's answer, you need to be aware that if one of those servers goes down and you remove its DNS record, a portion of your users will be directed to the down server until the TTL on their DNS records expire or you manually change IPs. Depending on how important uptime is to you, this may not be acceptable. In addition, any users that were in the middle of a session with the server that goes down, they would lose that session.
RRDNS is fine for load balancing, but isn't really the key to high availability.
The easiest way (and by easiest I mean simplest, not necessarily cheapest) to implement true high-availability load balancing is to use a hardware load balacing network appliance that sits in between the Internet connection and your web servers. Such a device can be used to split the load between your systems, and also to automatically (or manually) remove a server from the rotation if there is a problem. In addition, it will take care of TCP connections, so that a user can be automatically hooked up to another server if their original device goes down. Another advantage of this solution is that it generally requires little or no application modification to implement. Note that a truly "high-availability" configuration will usually use two load balancers to reduce single points of failure.
Another option is to use regular servers to achieve a high availability load balancing scenario. Here is some info on configuring a high-availability load-balanced Apache cluster. The Linux-HA site is a great source for Linux load-balancing information.
Yet another option is something like the Linux Virtual Server project. LVS uses Linux for all components, both servers and load balancers, and generally provides a seamless solution (once configured).
To conclude: my general recommendation is that for a situation like yours, where an inexperienced admin is asked to set up load balancing in such a situation, a hardware load balancer appliance is the most painless way to do so. It obviously costs some money, but can save a lot of time. Determining the trade-off point is an individual decision, of course.