Why does ec2 monitoring show 100% cpu and top only 20%?

I am running an python script on an ec2 instance that inserts rows in an database on another instance. In ec2's monitoring I saw a 100% cpu utilization, whereas top only shows 20% for the python process. What is missing from top? Network overhead?


Solution 1:

The data exposed by top is often insufficient or misleading in virtualized environments like Amazon EC2 and the reported percentage depends on your instance type and the under­ly­ing proces­sor core utilization (which usually doesn't match the virtualized hardware you are presented with from the hypervisor), amongst other things - what you are seeing is most likely caused by respective CPU steal time as exposed in most related Unix/Linux monitoring tools nowadays - see e.g. columns %steal or st in sar or top:

st -- Steal Time
The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

The blog post EC2 monitoring: the case of stolen CPU provides a nice exploration and illustration of this topic:

When the top com­mand dis­plays 40% CPU busy but Cloud­Watch says the server is maxed out at 100% — which side do you take? The answer is sim­ple (Cloud­Watch is cor­rect, top is not) [...]

Please note that this hypervisor metric seems to be (easily) accessible on Unix/Linux systems only, but doesn't seem to be observable on Windows (yet), see my question Is there a Windows equivalent of Unix 'CPU steal time'? for more regarding this problem.