Uneven CPU core utilization
tl;dr: First CPU core is consistently saturated, all other cores are consistently under-loaded.
A VM, inside Ubuntu-based Xen XCP:
$ uname -a Linux MYHOST 2.6.38-15-virtual #59-Ubuntu SMP Fri Apr 27 16:40:18 UTC 2012 i686 i686 i386 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 11.04 Release: 11.04 Codename: natty
This VM has 8 CPU cores.
There are 10 single-threaded worker processes running on this VM, which are connected via FCGI interface to the nginx server (listening on a local network port).
Under synthetic load from AB, only first core of eight is ever loaded to 100% (as seen from htop
). It remains under very high load more or less constantly, and all other cores are loaded anywhere from 0 to 100%, more or less randomly (and CPU load of these cores is jumping around).
Here is what I typically see under load in htop
:
1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.3%] Tasks: 70, 35 thr; 11 running 2 [||||||||||||||| 15.0%] Load average: 3.86 1.05 0.39 3 [||||||||||||||||||||||||||||||||||| 36.7%] Uptime: 22 days, 06:31:57 4 [|||||||||||||||| 15.7%] 5 [||||||||||||||||||||| 22.4%] 6 [||||||||||||||||||| 19.9%] 7 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 71.2%] 8 [|||||||||||||||||||||||||||||| 31.3%] Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||394/4028MB] Swp[ 0/5362MB] PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 26213 www-data 20 0 49748 26952 2448 R 29.0 0.7 10:42.61 /usr/bin/luajit2 26227 www-data 20 0 50172 27412 2452 R 27.0 0.7 10:43.53 /usr/bin/luajit2 26221 www-data 20 0 50736 27948 2452 R 27.0 0.7 10:39.02 /usr/bin/luajit2 26234 www-data 20 0 50128 27232 2452 R 27.0 0.7 10:36.36 /usr/bin/luajit2 26218 www-data 20 0 50232 27376 2452 R 26.0 0.7 10:39.32 /usr/bin/luajit2 26214 www-data 20 0 51268 28496 2452 R 26.0 0.7 10:58.15 /usr/bin/luajit2 26232 www-data 20 0 50420 27588 2452 R 25.0 0.7 10:39.21 /usr/bin/luajit2 26217 www-data 20 0 50236 27348 2452 R 25.0 0.7 10:34.44 /usr/bin/luajit2 26219 www-data 20 0 50748 27960 2448 R 23.0 0.7 10:45.30 /usr/bin/luajit2 26239 www-data 20 0 49772 27188 2452 R 22.0 0.7 10:39.39 /usr/bin/luajit2 26368 www-data 20 0 10856 3796 968 S 15.0 0.1 1:12.62 nginx: worker process 26369 www-data 20 0 10652 3504 968 S 2.0 0.1 1:12.75 nginx: worker process 26372 www-data 20 0 10520 3504 968 S 0.0 0.1 1:18.64 nginx: worker process ...
During the load test all worker processes are in R
, load test runs for about 10-15 minutes (and performance is about 700-900 hits/second). Traffic is, of course, generated from external machines.
Looks like this CPU core load disbalance is the main performance bottleneck, and if all cores were loaded evenly, performance could be higher.
Any clues on how to troubleshoot this issue?
Please tell me if I can provide more information.
It looks like CPU0 receives every eth1 interrupt, and there are a lot of them.
$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 283: 113720624 0 0 0 0 0 0 0 xen-dyn-event eth1 284: 1 0 0 0 0 0 0 0 xen-dyn-event eth0 285: 2254 0 0 3873799 0 0 0 0 xen-dyn-event blkif 286: 23 0 0 0 0 0 0 0 xen-dyn-event hvc_console 287: 492 42 0 0 0 0 0 295324 xen-dyn-event xenbus 288: 0 0 0 0 0 0 0 222294 xen-percpu-ipi callfuncsingle7 289: 0 0 0 0 0 0 0 0 xen-percpu-virq debug7 290: 0 0 0 0 0 0 0 151302 xen-percpu-ipi callfunc7 291: 0 0 0 0 0 0 0 3236015 xen-percpu-ipi resched7 292: 0 0 0 0 0 0 0 60064 xen-percpu-ipi spinlock7 293: 0 0 0 0 0 0 0 12355510 xen-percpu-virq timer7 294: 0 0 0 0 0 0 803174 0 xen-percpu-ipi callfuncsingle6 295: 0 0 0 0 0 0 0 0 xen-percpu-virq debug6 296: 0 0 0 0 0 0 60027 0 xen-percpu-ipi callfunc6 297: 0 0 0 0 0 0 5374762 0 xen-percpu-ipi resched6 298: 0 0 0 0 0 0 64976 0 xen-percpu-ipi spinlock6 299: 0 0 0 0 0 0 15294870 0 xen-percpu-virq timer6 300: 0 0 0 0 0 264441 0 0 xen-percpu-ipi callfuncsingle5 301: 0 0 0 0 0 0 0 0 xen-percpu-virq debug5 302: 0 0 0 0 0 79324 0 0 xen-percpu-ipi callfunc5 303: 0 0 0 0 0 3468144 0 0 xen-percpu-ipi resched5 304: 0 0 0 0 0 66269 0 0 xen-percpu-ipi spinlock5 305: 0 0 0 0 0 12778464 0 0 xen-percpu-virq timer5 306: 0 0 0 0 844591 0 0 0 xen-percpu-ipi callfuncsingle4 307: 0 0 0 0 0 0 0 0 xen-percpu-virq debug4 308: 0 0 0 0 75293 0 0 0 xen-percpu-ipi callfunc4 309: 0 0 0 0 3482146 0 0 0 xen-percpu-ipi resched4 310: 0 0 0 0 79312 0 0 0 xen-percpu-ipi spinlock4 311: 0 0 0 0 21642424 0 0 0 xen-percpu-virq timer4 312: 0 0 0 449141 0 0 0 0 xen-percpu-ipi callfuncsingle3 313: 0 0 0 0 0 0 0 0 xen-percpu-virq debug3 314: 0 0 0 95405 0 0 0 0 xen-percpu-ipi callfunc3 315: 0 0 0 3802992 0 0 0 0 xen-percpu-ipi resched3 316: 0 0 0 76607 0 0 0 0 xen-percpu-ipi spinlock3 317: 0 0 0 16439729 0 0 0 0 xen-percpu-virq timer3 318: 0 0 876383 0 0 0 0 0 xen-percpu-ipi callfuncsingle2 319: 0 0 0 0 0 0 0 0 xen-percpu-virq debug2 320: 0 0 76416 0 0 0 0 0 xen-percpu-ipi callfunc2 321: 0 0 3422476 0 0 0 0 0 xen-percpu-ipi resched2 322: 0 0 69217 0 0 0 0 0 xen-percpu-ipi spinlock2 323: 0 0 10247182 0 0 0 0 0 xen-percpu-virq timer2 324: 0 393514 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle1 325: 0 0 0 0 0 0 0 0 xen-percpu-virq debug1 326: 0 95773 0 0 0 0 0 0 xen-percpu-ipi callfunc1 327: 0 3551629 0 0 0 0 0 0 xen-percpu-ipi resched1 328: 0 77823 0 0 0 0 0 0 xen-percpu-ipi spinlock1 329: 0 13784021 0 0 0 0 0 0 xen-percpu-virq timer1 330: 730435 0 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle0 331: 0 0 0 0 0 0 0 0 xen-percpu-virq debug0 332: 39649 0 0 0 0 0 0 0 xen-percpu-ipi callfunc0 333: 3607120 0 0 0 0 0 0 0 xen-percpu-ipi resched0 334: 348740 0 0 0 0 0 0 0 xen-percpu-ipi spinlock0 335: 89912004 0 0 0 0 0 0 0 xen-percpu-virq timer0 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 0 0 0 0 0 0 0 0 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RES: 3607120 3551629 3422476 3802992 3482146 3468144 5374762 3236015 Rescheduling interrupts CAL: 770084 489287 952799 544546 919884 343765 863201 373596 Function call interrupts TLB: 0 0 0 0 0 0 0 0 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 0 0 0 0 0 0 0 0 Machine check polls ERR: 0 MIS: 0