HAProxy URI balancing isn't very balanced

Solution 1:

I'm not sure if this is very helpful, but I've struggled a bit with the same problem - and this is what I've concluded;

Hash-based load balancing will, as you've already established, never give you perfect load balancing. The behavior you see can simply be explained by having a few of the most visited / largest pages on the same server - by having few pages that gets a lot of traffic, and a lot of pages that get little traffic, this will be enough to skew the statistics.

Your configuration is to use consistent hashing. The ID's and server weight determine the final server the hashed entry will be directed to - that is why your balancing is affected by this. The documentation is pretty clear that even though this is a good algorithm for balancing caches - it may require you to change around the IDs and increase the total weight of the servers to get a more even distribution.

If you take a large sample of unique addresses (more than 1000), and you visit each of these one time - you should see that the session counter is a lot more equal across the three backends than if you allow 'ordinary' traffic against the balancer as this is affected by the traffic pattern of the site as well.

My advice would be to make sure that you hash the entire URL, not just what's to the left of "?". This is controlled by using balance uri whole in the configuration. Ref. the haproxy documentation. If you have a lot of URL's which have the same base, but with varying GET-parameters - this will definitely give you improved results.

I would also take into consideration how the load balancing affects the capacity of your cache servers. If it doesn't effectively affect redundancy in any way - I wouldn't worry too much about it, as getting perfect load balancing isn't something you are likely to achieve with URI-hashing.

I hope this helps.

Solution 2:

I ended up changing the config as so:

backend varnish
        # hash balancing
        balance uri
        hash-type consistent

        server varnish1 64.106.164.122:80 check observe layer7 maxconn 5000 id 1 weight 75
        server varnish2 64.106.164.121:80 check observe layer7 maxconn 5000 id 715827882 weight 50
        server varnish3 64.106.164.117:80 check observe layer7 maxconn 5000 id 1431655764 weight 38

It turns out that the IDs seem to matter a lot, I have these spaced out now across the range and this seems to help the balancing. I tweaked the weights as well as you can see.

Now getting a result like this: New haproxy stats

The middle server is still underused, but that's as close to balanced as I could get it, and that's fine for my purpose. I'm using HAproxy to do URI hashing so I could add this third varnish server without increasing backend load, and it seems to be working well, I'm seeing a noticeable decrease in backend load with 3 URI balanced varnish servers vs two randomly balanced ones.

The takeaway from this is that the IDs matter a lot and should be spaced out, which I haven't seen clearly stated anywhere else. Once the IDs are spread out, changing the weights helps, but it's still very unpredictable and requires a lot of tweaking and trial and error. Drastically raising a server's weight can cause it's traffic to drop significantly, which is a weird result.