I'm interested in monitoring our EC2 instances to ensure we scale up when necessary. Right now we are monitoring idle CPU time as our metric.

We aren't measuring disk I/O as we are not a very disk intensive application.

When running on our own hardware in a datacenter I also usually monitor "load" from the top command.

My question is:
Does it make sense to monitor "load" on a shared environment such as EC2? If so, how do you interpret the results?


Solution 1:

Load on EC2 is measured and interpreted the same as on any Linux system. The virtual machine environment does not affect that metric.

That said, cpu idle may be a better metric than load for measuring how busy a server is for scaling purposes.

Solution 2:

You might want to add monitoring on your load balancer to check for Healthy Instances and 5xx status codes.

The times when our servers have been overloaded the clients start getting "503 Service Unavailable" responses and we start an additional server and then the one being overloaded recovers.