How to troubleshoot slow performance on AWS EC2/RDS?

We recently moved our web servers from some 10 year old boxes to AWS EC2.

Usage of the site is currently higher now (it's our busy season) and the site has become much slower, which is unexpected because our instance sizes are much higher than what we had previously.

We run a pretty small site that only gets a few hundred at a time. We're running a c3.large instance on our webserver, and db.m1.large for our RDS MySQL database. We don't have any read replicas and or multiple webservers (load balancing). According to Google Analytics, we only had 18,106 page views for the whole day.

Our users (external and internal) keep seeing the site time out in their browser. It's pretty much across the board instead of any particular page. MySQL PROCESS LIST is also nearly empty without any table locks or whatnot.

If you look at our stats in CloudWatch, everything should be fine. We have very CPU utilization, and what I think is pretty low Network I/O. Likewise on the RDS side, nothing here is screaming "bottleneck".

EC2 Usage (c3.large) UE2 Usage

RDS Usage (db.m1.large) RDS Usage

Any ideas how I should go about troubleshooting this issue?


Solution 1:

Finally tracked down the cause of our problems. Apache was misconfigured to have a Keep Alive timeout for 30 seconds. That combined with an overly aggressive AJAX script was causing the site to hang and wait for a connection.

Turning KeepAliveTimeout down to 7, as well as taming the AJAX script, brought everything back to normal.