What is the most effective load balancing solution for Windows?

For this I am specifically referring to running websites on Windows, but it's not specific to IIS. We sometimes run Tomcat as well. Now obviously there are many hardware and software options to go with here and there are many points to consider in the windows world like sticky sessions and state that is sometimes inherent in a web app.

So what I'm looking for is any good experiences with effective load balancing strategies be they hardware or software based and reasons to justify the business costs related to this.

Thanks

Update:

In response to some deeper probing questions from Paul, I personally would be looking for least-loaded load balancing and definitely with session fail over. Now this kind of thing could also be handled by something like Memcached or something similar. SSL termination is very much dependent on whether you're going the hardware or software route, but then I may as well answer my own question.

The volumes could be anything from a start-up with the plan towards hopefully growing exponentially without presenting users with the infamous Fail Whale...

The question specifically asks what load balancing strategies have been effective for you in the past for running websites on windows. If the strategy you propose is hardware load balancing where SSL Termination occurs at the load balancer for instance, and you elaborate on why that turned out to be a good strategy for you, then that will have helped answer my question. Right now, I just don't have enough experience with different methods and alternatives, hence my question.


Solution 1:

I'm a big fan of Windows Load Balancing Service (WLBS). I realize that some might say it's not up to the task of handling a large site, but somehow Microsoft is able to make it work for Microsoft.com, so I tend to disagree with those statements.

Solution 2:

Like scevans mentioned load balancing can certainly be done with windows NLB. I'd guess most windows web shops start out that way. And some continue to use it indefinitely.

It's a common misconception that NLB performs poorly, it's really not half bad from a performance prospective. NLB is very good for some things like applications that are relatively stable and not under active development. Where NLB starts to let you down is it's State checking, NLB is basically completely oblivious to the application layer, therefore at some point in time, when one of your JVM's or IIS Worker Processes blow up, NLB will continue to see the broken server listening, and continue to happily route requests to your down application on your up server, this is obviously bad.

This is sort of where purpose built web load balancers start to come into their own. Pretty much all of them support a variety of balance methods, anything from round robin, least connection, and weighted versions of both those.

Being able to specify a health check page for each application on each server brings a new level of confidence and allows the load balancers to make better decisions. Depending on your application, a health check page should usually be purpose built. I like the health check page to quickly poll the db server to ensure the db connectivity is functional. This call should be super cheap, our DB wizard supplies this i think its a 'Select 1' but i could be wrong. If the call succeeds, the test page displays a server identifier (we also use this page to see what server we get stuck to on sticky enabled Farms). Anyway, the load balancer will hit this several times a minute, and then parse the response file, as long as it doesn't fail it stays in the rotation. If that page fails for any reason, the server is removed from the rotation. This is also handy for situations where the people administering the web servers don't have the expertise/access to the load balancer. By simply renaming the health check file they can easily pull a server out of the rotation for deployment or troubleshooting purposes. Most load balancers can also be configured with a sorry server. If your health check fails on all nodes in the farm, users are sent to a different server containing a (static) maintenance page.

SSL termination on the load balancer can also be handy. Some load balancers don't speak HTTPS for health checking, so in some situations, offloading your SSL is the only way you get to take advantage of all the cool health check related features listed above. Offloading your SSL also obviously frees up web server cpu cycles to serve web rather than encrypt/decrypt, but to me the bigger benefits are around troubleshooting and SSL management. Depending on your platform and tools managing an SSL cert across a large farm of servers isn't very easy (msdeploy didn't support it until the latest release). Installing a cert on 2 load balancers is usually faster/easier than 10+ servers. It's also pretty helpful in troubleshooting, every now and then you get that super bizarro issue where you need to dig down to the packet level and snoop what is being sent to the web server. With offloaded SSL all of that traffic is in plain text and infinitely easier to analyze. Note: this also means the network segment between your load balancers and your servers needs to be secure.

With regards to failover, I'm of the opinion that applications should be designed to not require sticky sessions unless there is an extremely compelling reason. In the .NET world this usually is as simple as configuring the application to send its session state to a sql server on all nodes, and make sure all the session objects are serializable (you'll get a nice big error the first time you turn it on if they're not). When you're running in this configuration, you can work on web servers whenever without impacting users simply by removing the server from the rotation, making your change, and putting it back into the rotation. Since the session data is all stored centrally in SQL, it doesn't matter which node answers the request.

With regards to Software vs. Hardware and justifying the costs, that one is a little tricker. If you've got the budget hardware balancers are nice. Doing stuff in hardware is obviously faster, and more importantly, the support you get from hardware vendors is generally a lot better. These days software balancers are getting more and more feature complete.

Where as most of my experience is on the F5-BigIP, Cisco Arrowpoint/CSS and Cisco ACS lines. I've recently started looking into IIS7 Application Request Routing (ARR) for a consulting client of mine on a budget. If you've got spare hardware and windows licenses, it doesn't cost you anything, and it looks fairly feature complete. I'm interested to see how it stacks up. You can find more info about ARR here.

Let me know if you'd like any more specific info