Why does top report a different cpu usage than CloudWatch?
top
shows an average CPU usage during peak times of about 20% while CloudWatch monitoring shows an average CPU usage of 40%. What causes this discrepancy?
Solution 1:
A very good observation and we have run into this as well. Here's what I found:
Be careful measuring CPU usage from within an EC2 instance. It’s possible to see CPU usage well below 100%—and yet be completely maxed out. Trust me: been there, done that. (CloudWatch CPUUtilization, by the way, is measured from outside the instance and is always correct.)
There’s a very good description of the whole thing here: https://axibase.com/news/ec2-monitoring-the-case-of-stolen-cpu/
In the example above, the m1.small EC2 instance was allocated 0.4 processor units and so 40% CPU busy means the percentage usage of the underlying core. However because 40% is the maximum CPU share that can be allocated to this VM, the effective CPU usage is 40%/40% = 100%. Which is the number displayed by CloudWatch.
If you’re wondering where does 40% comes from, the math is pretty simple. The m1.small linux system is entitled to 1 EC2 compute unit which provides the equivalent CPU capacity of a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon processor. Since the VM runs on a machine with 2.6 GHz clock speed, it’s entitled to 38.4%—46.2% processor share on this particular XEN node. You can run cat /proc/cpuinfo command to find out CPU architecture behind your EC2 instances.
Pay special attention to the hint about how to deal with tools that don’t know about the special math:
Another option that can used to retrofit the existing agent–or SNMP–based monitoring tools, that don’t integrate with CloudWatch, is to use the CPU idle metric. All you need to do is to re-write rules to measure CPU idle instead of CPU busy. E.g. if you have a >75% threshold defined for CPU busy, create a <25% rule for CPU idle. If CPU idle is 0, then your server is CPU bound.
Very simple. Very nice.
When you run top within the EC2 instance, it is measuring the CPU usage of the physical core machine that is running your instance and others. This usage is incorrect if you want to be measuring cpu usage of your instance alone (the EC2 compute unit assigned to your instance).
Which is why cloudwatch metrics is real since it is measured external to the instance for the EC2 compute unit(s) assigned to your instance alone.
See here -- https://forums.aws.amazon.com/thread.jspa?threadID=99993