Advantage of Microsoft Cluster over Microsoft Network Load Balancer

Till recently, I assumed that Microsoft NLB worked at an OS/Machine level rather than an Application Level. i.e. the NLB just monitors heartbeats on the machine to check if machine is alive and then switches off a particular node if it's gone down.

However, I found this comment on a server fault question which claims differently. As per the comment

NLB just routes connections to the TCP port that is open. If your application closes the port then NLB won't route connections to it any more until the port is open again.

  1. Is the above true? Does NLB monitor applications at a port level?
  2. If the answer to (1) is 'yes', then will it switch for both the service going down and also the service hung case or only for one of those cases?
  3. If NLB indeed does all of the above, then what's the case for using Clustering at all? Only advantage is that for clustering, you do not need replicated data. But overall clustering would be the more expensive solution.
  4. Will the answers to the above questions be different for a standard product like MS SQL Server as against my own service or is it the same?
  5. If NLB does not do the above and just does OS/Machine level heartbeats, then is there another way other than clustering to provide HA and switchover for my own service?

Solution 1:

That's not how NLB works. The NLB port rule determines what port/ports are load balanced amongst the NLB hosts. Traffic not "bound" to a NLB port rule is not load balanced amongst the NLB hosts. NLB does not monitor the port/ports associated with a port rule and disable NLB cluster traffic to that host upon a close of that/those port/ports or the crash of an application providing services on that/those port/ports on a particular host. NLB uses a Layer 2 "heartbeat" to determine the availability of a host in the cluster. If a host fails the heartbeat mechanism then all of the other hosts will "converge" (or re-converge) removing the non-responding host from the cluster so that no cluster traffic (based on the port rule) is directed to the non-responding host. NLB is strictly a layer 3 (network layer) load balancing mechanism. It is not a layer 7 (application layer) load balancing mechanism.

It's perfectly normal to have a hung application on an NLB host (such as HTTP or RDP) defined in an NLB port rule still receiving NLB traffic even though the application isn't capable of accepting that traffic. This is because NLB isn't aware of anything above layer 3.