CPU0 is swamped with eth1 interrupts
I've got an Ubuntu VM, running inside Ubuntu-based Xen XCP. It hosts a custom FCGI-based HTTP service, behind nginx
.
Under load from ab
the first CPU core is saturated, and the rest is under-loaded.
In /proc/interrupts
I see that CPU0 serves an order of magnitude more interrupts than any other core. Most of them come from eth1
.
Is there anything I can do to improve performance of this VM? Is there a way to balance interrupts more evenly?
Gory details:
$ uname -a Linux MYHOST 2.6.38-15-virtual #59-Ubuntu SMP Fri Apr 27 16:40:18 UTC 2012 i686 i686 i386 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 11.04 Release: 11.04 Codename: natty $ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 283: 113720624 0 0 0 0 0 0 0 xen-dyn-event eth1 284: 1 0 0 0 0 0 0 0 xen-dyn-event eth0 285: 2254 0 0 3873799 0 0 0 0 xen-dyn-event blkif 286: 23 0 0 0 0 0 0 0 xen-dyn-event hvc_console 287: 492 42 0 0 0 0 0 295324 xen-dyn-event xenbus 288: 0 0 0 0 0 0 0 222294 xen-percpu-ipi callfuncsingle7 289: 0 0 0 0 0 0 0 0 xen-percpu-virq debug7 290: 0 0 0 0 0 0 0 151302 xen-percpu-ipi callfunc7 291: 0 0 0 0 0 0 0 3236015 xen-percpu-ipi resched7 292: 0 0 0 0 0 0 0 60064 xen-percpu-ipi spinlock7 293: 0 0 0 0 0 0 0 12355510 xen-percpu-virq timer7 294: 0 0 0 0 0 0 803174 0 xen-percpu-ipi callfuncsingle6 295: 0 0 0 0 0 0 0 0 xen-percpu-virq debug6 296: 0 0 0 0 0 0 60027 0 xen-percpu-ipi callfunc6 297: 0 0 0 0 0 0 5374762 0 xen-percpu-ipi resched6 298: 0 0 0 0 0 0 64976 0 xen-percpu-ipi spinlock6 299: 0 0 0 0 0 0 15294870 0 xen-percpu-virq timer6 300: 0 0 0 0 0 264441 0 0 xen-percpu-ipi callfuncsingle5 301: 0 0 0 0 0 0 0 0 xen-percpu-virq debug5 302: 0 0 0 0 0 79324 0 0 xen-percpu-ipi callfunc5 303: 0 0 0 0 0 3468144 0 0 xen-percpu-ipi resched5 304: 0 0 0 0 0 66269 0 0 xen-percpu-ipi spinlock5 305: 0 0 0 0 0 12778464 0 0 xen-percpu-virq timer5 306: 0 0 0 0 844591 0 0 0 xen-percpu-ipi callfuncsingle4 307: 0 0 0 0 0 0 0 0 xen-percpu-virq debug4 308: 0 0 0 0 75293 0 0 0 xen-percpu-ipi callfunc4 309: 0 0 0 0 3482146 0 0 0 xen-percpu-ipi resched4 310: 0 0 0 0 79312 0 0 0 xen-percpu-ipi spinlock4 311: 0 0 0 0 21642424 0 0 0 xen-percpu-virq timer4 312: 0 0 0 449141 0 0 0 0 xen-percpu-ipi callfuncsingle3 313: 0 0 0 0 0 0 0 0 xen-percpu-virq debug3 314: 0 0 0 95405 0 0 0 0 xen-percpu-ipi callfunc3 315: 0 0 0 3802992 0 0 0 0 xen-percpu-ipi resched3 316: 0 0 0 76607 0 0 0 0 xen-percpu-ipi spinlock3 317: 0 0 0 16439729 0 0 0 0 xen-percpu-virq timer3 318: 0 0 876383 0 0 0 0 0 xen-percpu-ipi callfuncsingle2 319: 0 0 0 0 0 0 0 0 xen-percpu-virq debug2 320: 0 0 76416 0 0 0 0 0 xen-percpu-ipi callfunc2 321: 0 0 3422476 0 0 0 0 0 xen-percpu-ipi resched2 322: 0 0 69217 0 0 0 0 0 xen-percpu-ipi spinlock2 323: 0 0 10247182 0 0 0 0 0 xen-percpu-virq timer2 324: 0 393514 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle1 325: 0 0 0 0 0 0 0 0 xen-percpu-virq debug1 326: 0 95773 0 0 0 0 0 0 xen-percpu-ipi callfunc1 327: 0 3551629 0 0 0 0 0 0 xen-percpu-ipi resched1 328: 0 77823 0 0 0 0 0 0 xen-percpu-ipi spinlock1 329: 0 13784021 0 0 0 0 0 0 xen-percpu-virq timer1 330: 730435 0 0 0 0 0 0 0 xen-percpu-ipi callfuncsingle0 331: 0 0 0 0 0 0 0 0 xen-percpu-virq debug0 332: 39649 0 0 0 0 0 0 0 xen-percpu-ipi callfunc0 333: 3607120 0 0 0 0 0 0 0 xen-percpu-ipi resched0 334: 348740 0 0 0 0 0 0 0 xen-percpu-ipi spinlock0 335: 89912004 0 0 0 0 0 0 0 xen-percpu-virq timer0 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 0 0 0 0 0 0 0 0 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RES: 3607120 3551629 3422476 3802992 3482146 3468144 5374762 3236015 Rescheduling interrupts CAL: 770084 489287 952799 544546 919884 343765 863201 373596 Function call interrupts TLB: 0 0 0 0 0 0 0 0 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 0 0 0 0 0 0 0 0 Machine check polls ERR: 0 MIS: 0
Look in the /proc/irq/283
directory. There is a smp_affinity_list
file which shows which CPUs will get the 283 interrupt. For you this file probably contains "0" (and smp_affinity
probably contains "1").
You can write the CPU range to the smp_affinity_list
file:
echo 0-7 | sudo tee /proc/irq/283/smp_affinity_list
Or you can write a bitmask, where each bit corresponds to a CPU, to smp_affinity
:
printf %x $((2**8-1)) | sudo tee /proc/irq/283/smp_affinity
However, irqbalance is known to have its own idea of what affinity each interrupt should have, and it might revert your updates. So it is best if you just uninstall irqbalance completely. Or at least stop it and disable it from coming up on reboot.
If even without irqbalance you are getting odd smp_affinity
for interrupt 283 after a reboot, you will have to manually update the CPU affinity in one of your startup scripts.
If you have the right model of Intel NIC you can improve performance significantly.
To quote the first paragraph:
Multicore processors and the newest Ethernet adapters (including the 82575, 82576, 82598, and 82599) allow TCP forwarding flows to be optimized by assigning execution flows to individual cores. By default, Linux automatically assigns interrupts to processor cores. Two methods currently exist for automatically assigning the interrupts, an inkernel IRQ balancer and the IRQ balance daemon in user space. Both offer tradeoffs that might lower CPU usage but do not maximize the IP forwarding rates. Optimal throughput can be obtained by manually pinning the queues of the Ethernet adapter to specific processor cores.
For IP forwarding, a transmit/receive queue pair should use the same processor core and reduce any cache synchronization between different cores. This can be performed by assigning transmit and receive interrupts to specific cores. Starting with Linux kernel 2.6.27, multiple queues can be used on the 82575, 82576, 82598, and 82599. Additionally, multiple transmit queues were enabled in Extended Messaging Signaled Interrupts (MSI-X). MSI-X supports a larger number of interrupts that can be used, allowing for finer-grained control and targeting of the interrupts to specific CPUs.
See: Assigning Interrupts to Processor Cores using an Intel® 82575/82576 or 82598/82599 Ethernet Controller
Actually it is recommended, especially when dealing with repetitive processes over a short duration, that all interruptions generated by a device queue is handled by the same CPU, instead of IRQ balancing and thus you will see better performance if a single CPU handled the eth1 interrupt*** exception provided below
The source, linked above, is from the Linux Symposium and I do recommend you read through the couple paragraphs on SMP IRQ Affinity because it will convince you more effectively than this post.
Why?
Recall each processor has its own cache aside from being able to access main memory, check out this diagram. When an interrupt is triggered, a CPU core will have to fetch the instructions to handle the interrupt from main memory, which takes much longer than if the instructions where in the cache. Once a processor executed a task it will have those instructions in the cache. Now say the same CPU core handles the same interrupt almost all the time, the interrupt handler function will unlikely leave the CPU core cache, boosting the kernel performance.
Alternatively, when IRQ is balanced it can assign the interruption to be handled constantly by different CPU, then the new CPU core probably will not have the interrupt handler function in the cache, and a long time will be required to get the proper handler from main memory.
Exception: if you are seldom using eth1 interrupt, meaning enough time passes that the cache is overwritten by doing other tasks, meaning you have data coming over that interface intermittently with long periods in between...then you most likely will not see these benefits for they are when you use a process at high frequency.
Conclusion
If your interrupt occurs very frequently then just bind that interrupt to be handled by a specific CPU only. This configuration lives at
/proc/'IRQ number'/smp_affinity
or
/proc/irq/'IRQ number'/smp_affinity
See the last paragraph in the SMP IRQ Affinity section from the source linked above, it has instructions.
Alternatively
You can change the frequency that the interrupt flag is raised by either increasing the MTU size (jumbo frames) if the network allows for it or change to have the flag raised after a larger amount of packets are received instead of at every packet OR change the time out, so raise interrupt after a certain amount of time. Caution with the time option because your buffer size might be full before time runs out. This can be done using the ethtool which is outlined in the linked source.
this answer is approaching the length at which people wont read it so I will not go into much detail, but depending on your situation there are many solutions... check the source :)