Gracefully take a server out of Azure Load Balancer (drain stop)

We have an application deployed to Azure IaaS VMs, served by IIS. In order to install updates, we need to take each machine out of the load balancer, one by one. Before moving to Azure, we were using Microsoft NLB which has the function to DRAIN STOP a node - by not sending new connections, but keep the existing connections open until they complete. How can we achieve the same with Azure LB?


Solution 1:

The recommended way to do this is to have a custom health probe in your load balanced set. For example, you could have a simple healthcheck.html page on each of your VM's (in wwwroot for example) and direct the probe from your load balanced set to this page. As long as the probe can retrieve that page (HTTP 200), the Azure load balancer will keep sending user requests to the VM.

When you need to update a VM, then you can simply rename the healthcheck.html to a different name such as _healthcheck.html. This will cause the probe to start receiving HTTP 404 errors and will take that machine out of the load balanced rotation because it is not getting HTTP 200. Existing connections will continue to be serviced but the Azure LB will stop sending new requests to the VM.

After your updates on the VM have been completed, rename _healthcheck.html back to healthcheck.html. The Azure LB probe will start getting HTTP 200 responses and as a result start sending requests to this VM again.

Repeat this for each VM in the load balanced set.

Solution 2:

In their documentation, Microsoft recommends using a Security Group to explicitly block the health probe. All Azure Load Balancer health probes will come from 168.63.129.16.

An example would be using an incoming NSG rule to deny 168.63.129.16 to destination of the VM NIC that you want to remove from the pool.