top command - cpu from processes do not add up

Solution 1:

Those two information that you are comparing won't match, simply, because they are collected from different files. That is although top shows the information in the same terminal, they aren't collected from the same source.

I simply ran an strace on top (running in batch mode). This is where it shows the system wide CPU information.

16:04:04.081092 open("/proc/stat", O_RDONLY) = 6 <0.000022>
16:04:04.081154 lstat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 <0.000015>
16:04:04.081211 lstat("/proc/stat", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0     <0.000013>
16:04:04.081267 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000013>
16:04:04.081334 fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000013>
16:04:04.081385 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f699ace2000 <0.000016>
16:04:04.081440 lseek(6, 0, SEEK_SET)   = 0 <0.000013>
16:04:04.081494 read(6, "cpu  302573 6910 83103 10092403 "..., 1024) = 1024 <0.000070>
16:04:04.081656 write(1, "%Cpu(s):  2.9 us,  0.8 sy,  0.1 "..., 80) = 80 <0.000034>
16:04:04.081763 write(1, "KiB Mem:   8048484 total,  41402"..., 73) = 73 <0.000035>
16:04:04.081858 write(1, "KiB Swap:  8060924 total,       "..., 72) = 72 <0.000034>
16:04:04.081940 write(1, "\n", 1)       = 1 <0.000026>

Now, if you see /proc/stat, it shows all the CPUs of the system. top knows that too, because before opening /proc/stat, it opens sys filesystem.

16:04:03.367339 open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3 <0.000027>
16:04:03.367408 read(3, "0-7\n", 8192)  = 4 <0.000019>
16:04:03.367464 close(3)                = 0 <0.000015>

Now, when it comes to collecting individual process information, it gets that from /proc/pid/statm and /proc/pid/stat file. (replace pid with well, actual pid).

As you can see, /proc/stat is system wide information for ALL the CPUs and individual proc files for the pids are their specific pid-only information.

So, they won't match.

Solution 2:

Sampling, which is how top measures CPU use, is subject to error.

The best way to explain it is like this: Imagine a factory that produces exactly one car per hour, on the hour. Say you decide to sample the rate at which the factory produces cars. You start sampling at 5:59 and stop sampling at 7:01. You see two cars produced, one at 6:00 and one at 7:00. You sampled for 62 minutes and 2 cars were produced. Thus you calculate that the factory was producing cars at about 200% of its rated capacity.

In addition, you cannot compare top values against each other because top doesn't provide you a set of measurements of a single system state but a set of independent measurements each subject to their own set of conditions.

For example, the per-CPU values can be computed using a completely different mechanism from the per-process values. The per-CPU values can be exponentially decayed while the per-process values can be the difference between two totals. So they can reflect measurements of the same type of thing, but using completely different methodologies.