Apache2 PHP Site - Hitting MaxClients limit - diagnosing?
We have a moderate-traffic site (roughly 20,000 hits a day), running a PHP/MySQL app on Apache 2.2, Ubuntu 9.10 Server, from an Amazon EC2 c1.small instance (1.7 Gb of RAM).
We had issues with the website repeatedly becoming unresponsive. As a dirty hack, I set the MaxClients/ServerLimit to 450.
<IfModule mpm_prefork_module>
KeepAlive On
KeepAliveTimeout 7
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 450
ServerLimit 450
MaxRequestsPerChild 0
</IfModule>
The site seems to go longer than before, but still dies. Checking the list of processes, I have (third column is physical mem, fourth column is virtual size):
xxxxxxxxx@domU-XXXXXXXXX:/etc/apache2$ ps -eo pid,user,rss,vsz,args | grep apache
2333 root 11092 39084 /usr/sbin/apache2 -k start
3704 www-data 11060 41292 /usr/sbin/apache2 -k start
3826 www-data 10016 39844 /usr/sbin/apache2 -k start
3954 www-data 11976 41612 /usr/sbin/apache2 -k start
4061 www-data 11844 41668 /usr/sbin/apache2 -k start
4064 www-data 10988 40676 /usr/sbin/apache2 -k start
4084 www-data 11804 41428 /usr/sbin/apache2 -k start
4086 www-data 10192 39828 /usr/sbin/apache2 -k start
4099 www-data 11876 41748 /usr/sbin/apache2 -k start
4100 www-data 10980 40668 /usr/sbin/apache2 -k start
4102 www-data 8952 39724 /usr/sbin/apache2 -k start
4107 www-data 11856 41860 /usr/sbin/apache2 -k start
4108 www-data 9952 39604 /usr/sbin/apache2 -k start
4109 www-data 0 0 [apache2] <defunct>
4114 www-data 7172 39724 /usr/sbin/apache2 -k start
4115 www-data 10968 40668 /usr/sbin/apache2 -k start
4122 www-data 11888 41844 /usr/sbin/apache2 -k start
4123 www-data 11584 41444 /usr/sbin/apache2 -k start
4124 www-data 7036 39596 /usr/sbin/apache2 -k start
4125 www-data 6744 39084 /usr/sbin/apache2 -k start
4126 www-data 9532 39552 /usr/sbin/apache2 -k start
4127 www-data 10112 39812 /usr/sbin/apache2 -k start
4128 www-data 6600 39084 /usr/sbin/apache2 -k start
4129 www-data 6736 39084 /usr/sbin/apache2 -k start
4130 www-data 7004 39596 /usr/sbin/apache2 -k start
4131 www-data 6740 39084 /usr/sbin/apache2 -k start
4132 www-data 11616 41596 /usr/sbin/apache2 -k start
4134 www-data 7024 39588 /usr/sbin/apache2 -k start
4135 www-data 11808 41516 /usr/sbin/apache2 -k start
4136 www-data 7008 39460 /usr/sbin/apache2 -k start
4137 www-data 6988 39460 /usr/sbin/apache2 -k start
4139 1003 796 3040 grep --color=auto apache
victorhooi@domU-12-31-39-02-B6-34:/etc/apache2$
Is there an easy way to find out what exactly is going on? My understanding of Apache's innards isn't that good, but I would have thought we wouldn't need this many concurrent processes to serve up a page like this, with this sort of traffic. We did inherit the app, so we don't know much about it's insides, but it's a fairly basic CMS-type site, showing a few search results, I didn't think it would need this sort of grunt.
I did run ab against the site, I was getting a fairly lousy request rate (well under 50 a second), but that may have been my poor choice of settings - a lot of those requests seemed to fail.
Where should I be looking for information on what's happening, or any troubleshooting tips I could try?
Cheers, Victor
Solution 1:
450 children with an RSS of around 10mb each is over 4GB of potential memory usage. More than enough to cause your c1.small instance to swap. Swapping is almost always a downward spiral for apache servers.
I'd say the next few things I'd be looking to check are
- does the apache error log mention hitting maxclients
- does dmesg or /var/log/messages mention OOM killer at all
- is the server swapping
- is the memory usage growth slow and steady or spiky/rapid-onset
The first two are just looking at txt files. The third you can do cli but graphs will help, and the fourth you need graphs. Setup apache's mod_status (probably already there just uncomment it) and point munin/collectd/cacti at it.
If you confirm the cause is memory exhaustion and swapping there's tons you can do from there. First off you can lower your maxclients to around 150. That'll leave some room for other stuff and the filesystem cache (mysql on here? if so leave more). RSS is a rough metric to extrapolate like this its just all we got. Once you tune that watch the graphs over time and see if you have room to go up or down. From there you can focus on 1.) skinnier apache children (fewer modules, tighten up the php config) 2.) have apache do less (mix of cdn, alternative http servers, and http proxy options) 3.) upgrayyed $$$
Solution 2:
You may have persistent connections open to your server with a long timeout. As additional clients continue connecting, they take up more and more of the apache processes. With persistent connections on, each client can take 1 (or more) connections to your server.
Check this out for more info: http://httpd.apache.org/docs/1.3/misc/perf-tuning.html
Solution 3:
I consolidated a bunch of performance tuning tips into http://www.anchor.com.au/hosting/dedicated/improving-server-capacity for work recently; it's worked pretty well for the machines I've applied it on recently. Beyond that, if it's a broader problem of machine performance that might not be specific to Apache, I've got a much more in-depth article at http://www.anchor.com.au/hosting/development/HuntingThePerformanceWumpus which covers identifying what component of the system is causing problems.