How to configure mod_proxy_balancer to gracefully fail under high load
We have a system which has one Apache instance in front of multiple tomcats. These tomcats then connect to various databases. We balance the load to the tomcat with mod_proxy_balancer.
Currently we are receiving 100 requests a second, the load on the Apache server is quite low, but due to database heavy operations on the tomcats, the load there is roughly 25% (of what I estimate they can handle).
In a few weeks there is an event happening and we estimate that our requests will jump significant, maybe by a factor of 10.
I'm doing everything I can do reduce the load on our tomcats, but I know we are going to run out of capacity, so I would like to fail gracefully. By this I mean, instead of trying to deal with too many connections which all timeout, I would like Apache to somehow monitor average response time, and as soon as the response time to Tomcat is getting above some threshold, I would like a error page displayed.
This means that users who are lucky still get a page rendered quickly, and those who are unlucky get a error page quickly. Instead of everyone waiting far too long for their page, and eventually everyone timing out, and the database being swamped with queries which are never used.
Hopefully this makes sense, so I was looking for suggestions on how I could achieve this.
thanks
Solution 1:
I refer to this as a "Sorry Server". If you're using Apache 2.2 you can add another host to your LB pool as a hot spare, and when the actual app servers reach capacity your balancer will direct requests to the "Sorry Server" until the application servers become available again. Here's a rough idea:
<Proxy balancer://yourapp>
BalancerMember http://10.0.0.1:8080 retry=5 max=50
BalancerMember http://10.0.0.2:8080 retry=5 max=50
BalancerMember http://10.0.0.3:8080 retry=5 max=50
BalancerMember http://10.0.0.4:8080 retry=5 max=50
# the hot standby on server2
BalancerMember http://10.0.0.5:80 status=+H
</Proxy>
<Location /app>
ProxyPass balancer://yourapp
</Location>
Actually you could alternatively set up an extra vhost on your loadbalancer machine and have it serve the "Sorry Server" page itself. Hope that helps :)
Solution 2:
A couple of notes:
The "max" parameter sets the maximum connections per child process, which, depending on the MPM you're using, will not create a hard maximum of concurrent connections. For example, prefork MPM will be almost totally useless for that.
Instead, I'd set it up using the "timeout" parameter and a customized 503 error page. Set the timeout to some sane value, beyond which you don't want your users to wait, and put some meaningful message in the 503 error page.
So:
ErrorDocument 503 /sitebusy.html
<Proxy balancer://yourapp>
BalancerMember http://10.0.0.1:8080 timeout=15 retry=5
BalancerMember http://10.0.0.2:8080 timeout=15 retry=5
BalancerMember http://10.0.0.3:8080 timeout=15 retry=5
BalancerMember http://10.0.0.4:8080 timeout=15 retry=5
</Proxy>
ProxyPass /app balancer://yourapp timeout=5
With this setup each worker will be put into a failed state if its response goes beyond 15 seconds and will be put back into the pool 5 seconds later. The balancer will wait at 5 seconds for a free worker.