Why do user and system cpu in cpuacct.stat not add up to cpuacct.usage?
Solution 1:
I'm not a kernel developer, but, digging through the kernel source code, cpuacct.usage (updated via cgroup_account_cputime) and cpuacct.stat (updated via cgroup_account_cputime_field seem to be calculated by different kernel components.
From what I understand the output of cpu.stat seems to heavily depend on kernel configuration, in particular CONFIG_VIRT_CPU_ACCOUNTING_GEN, CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and CONFIG_VIRT_CPU_ACCOUNTING. From their descriptions they seem to be more precise. A relevant file is kernel/sched/cputime.c, where timing updates seems to be caused by some kernel events(irqs etc.)
The output of cpuacct.usage seems to be calculated by the scheduler when switching between tasks. For example update_curr, which calls cgroup_account_cputime is called from enqueue_entity and dequeue_entity which seem to schedule tasks. This does not seem as affected by configuration.
Solution 2:
cpuacct.stat
contains CPU usage accumulated by process(es) in the cgroup expressed in ticks of 1/100th of a second, also called "user jiffies" (USER_HZ
). It may not be as precise as the CPU times accounted in nanoseconds.
You can obtain the USER_HZ
from shell (typically 100
)
$ getconf CLK_TCK
100
This should be mapped to a number of scheduler ticks per second, unless you are on a real-time or tickless kernel.
cpuacct.usage
gives the overall CPU time in nanoseconds, measured as precisely as the kernel can report usage times.
cpuacct.usage_all
or cpuacct.usage_percpu
will report usage per CPU core (thread) again measured in nanoseconds.
Note that the cpuacct
subsystem was originally written as a demonstration of cgroups capabilities. It wasn't meant for precise reporting.