I've got an Ubuntu VM, running inside Ubuntu-based Xen XCP. It hosts a custom FCGI-based HTTP service, behind nginx.

Under load from ab the first CPU core is saturated, and the rest is under-loaded.

In /proc/interrupts I see that CPU0 serves an order of magnitude more interrupts than any other core. Most of them come from eth1.

Is there anything I can do to improve performance of this VM? Is there a way to balance interrupts more evenly?


Gory details:

$ uname -a
Linux MYHOST 2.6.38-15-virtual #59-Ubuntu SMP Fri Apr 27 16:40:18 UTC 2012 i686 i686 i386 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 11.04
Release:    11.04
Codename:   natty

$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
283:  113720624          0          0          0          0          0          0          0   xen-dyn-event     eth1
284:          1          0          0          0          0          0          0          0   xen-dyn-event     eth0
285:       2254          0          0    3873799          0          0          0          0   xen-dyn-event     blkif
286:         23          0          0          0          0          0          0          0   xen-dyn-event     hvc_console
287:        492         42          0          0          0          0          0     295324   xen-dyn-event     xenbus
288:          0          0          0          0          0          0          0     222294  xen-percpu-ipi       callfuncsingle7
289:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug7
290:          0          0          0          0          0          0          0     151302  xen-percpu-ipi       callfunc7
291:          0          0          0          0          0          0          0    3236015  xen-percpu-ipi       resched7
292:          0          0          0          0          0          0          0      60064  xen-percpu-ipi       spinlock7
293:          0          0          0          0          0          0          0   12355510  xen-percpu-virq      timer7
294:          0          0          0          0          0          0     803174          0  xen-percpu-ipi       callfuncsingle6
295:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug6
296:          0          0          0          0          0          0      60027          0  xen-percpu-ipi       callfunc6
297:          0          0          0          0          0          0    5374762          0  xen-percpu-ipi       resched6
298:          0          0          0          0          0          0      64976          0  xen-percpu-ipi       spinlock6
299:          0          0          0          0          0          0   15294870          0  xen-percpu-virq      timer6
300:          0          0          0          0          0     264441          0          0  xen-percpu-ipi       callfuncsingle5
301:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug5
302:          0          0          0          0          0      79324          0          0  xen-percpu-ipi       callfunc5
303:          0          0          0          0          0    3468144          0          0  xen-percpu-ipi       resched5
304:          0          0          0          0          0      66269          0          0  xen-percpu-ipi       spinlock5
305:          0          0          0          0          0   12778464          0          0  xen-percpu-virq      timer5
306:          0          0          0          0     844591          0          0          0  xen-percpu-ipi       callfuncsingle4
307:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug4
308:          0          0          0          0      75293          0          0          0  xen-percpu-ipi       callfunc4
309:          0          0          0          0    3482146          0          0          0  xen-percpu-ipi       resched4
310:          0          0          0          0      79312          0          0          0  xen-percpu-ipi       spinlock4
311:          0          0          0          0   21642424          0          0          0  xen-percpu-virq      timer4
312:          0          0          0     449141          0          0          0          0  xen-percpu-ipi       callfuncsingle3
313:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug3
314:          0          0          0      95405          0          0          0          0  xen-percpu-ipi       callfunc3
315:          0          0          0    3802992          0          0          0          0  xen-percpu-ipi       resched3
316:          0          0          0      76607          0          0          0          0  xen-percpu-ipi       spinlock3
317:          0          0          0   16439729          0          0          0          0  xen-percpu-virq      timer3
318:          0          0     876383          0          0          0          0          0  xen-percpu-ipi       callfuncsingle2
319:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug2
320:          0          0      76416          0          0          0          0          0  xen-percpu-ipi       callfunc2
321:          0          0    3422476          0          0          0          0          0  xen-percpu-ipi       resched2
322:          0          0      69217          0          0          0          0          0  xen-percpu-ipi       spinlock2
323:          0          0   10247182          0          0          0          0          0  xen-percpu-virq      timer2
324:          0     393514          0          0          0          0          0          0  xen-percpu-ipi       callfuncsingle1
325:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug1
326:          0      95773          0          0          0          0          0          0  xen-percpu-ipi       callfunc1
327:          0    3551629          0          0          0          0          0          0  xen-percpu-ipi       resched1
328:          0      77823          0          0          0          0          0          0  xen-percpu-ipi       spinlock1
329:          0   13784021          0          0          0          0          0          0  xen-percpu-virq      timer1
330:     730435          0          0          0          0          0          0          0  xen-percpu-ipi       callfuncsingle0
331:          0          0          0          0          0          0          0          0  xen-percpu-virq      debug0
332:      39649          0          0          0          0          0          0          0  xen-percpu-ipi       callfunc0
333:    3607120          0          0          0          0          0          0          0  xen-percpu-ipi       resched0
334:     348740          0          0          0          0          0          0          0  xen-percpu-ipi       spinlock0
335:   89912004          0          0          0          0          0          0          0  xen-percpu-virq      timer0
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:          0          0          0          0          0          0          0          0   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RES:    3607120    3551629    3422476    3802992    3482146    3468144    5374762    3236015   Rescheduling interrupts
CAL:     770084     489287     952799     544546     919884     343765     863201     373596   Function call interrupts
TLB:          0          0          0          0          0          0          0          0   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          0          0          0          0          0          0          0          0   Machine check polls
ERR:          0
MIS:          0

Look in the /proc/irq/283 directory. There is a smp_affinity_list file which shows which CPUs will get the 283 interrupt. For you this file probably contains "0" (and smp_affinity probably contains "1").

You can write the CPU range to the smp_affinity_list file:

echo 0-7 | sudo tee /proc/irq/283/smp_affinity_list

Or you can write a bitmask, where each bit corresponds to a CPU, to smp_affinity:

printf %x $((2**8-1)) | sudo tee /proc/irq/283/smp_affinity

However, irqbalance is known to have its own idea of what affinity each interrupt should have, and it might revert your updates. So it is best if you just uninstall irqbalance completely. Or at least stop it and disable it from coming up on reboot.

If even without irqbalance you are getting odd smp_affinity for interrupt 283 after a reboot, you will have to manually update the CPU affinity in one of your startup scripts.


If you have the right model of Intel NIC you can improve performance significantly.

To quote the first paragraph:

Multicore processors and the newest Ethernet adapters (including the 82575, 82576, 82598, and 82599) allow TCP forwarding flows to be optimized by assigning execution flows to individual cores. By default, Linux automatically assigns interrupts to processor cores. Two methods currently exist for automatically assigning the interrupts, an inkernel IRQ balancer and the IRQ balance daemon in user space. Both offer tradeoffs that might lower CPU usage but do not maximize the IP forwarding rates. Optimal throughput can be obtained by manually pinning the queues of the Ethernet adapter to specific processor cores.

For IP forwarding, a transmit/receive queue pair should use the same processor core and reduce any cache synchronization between different cores. This can be performed by assigning transmit and receive interrupts to specific cores. Starting with Linux kernel 2.6.27, multiple queues can be used on the 82575, 82576, 82598, and 82599. Additionally, multiple transmit queues were enabled in Extended Messaging Signaled Interrupts (MSI-X). MSI-X supports a larger number of interrupts that can be used, allowing for finer-grained control and targeting of the interrupts to specific CPUs.

See: Assigning Interrupts to Processor Cores using an Intel® 82575/82576 or 82598/82599 Ethernet Controller


Actually it is recommended, especially when dealing with repetitive processes over a short duration, that all interruptions generated by a device queue is handled by the same CPU, instead of IRQ balancing and thus you will see better performance if a single CPU handled the eth1 interrupt*** exception provided below

The source, linked above, is from the Linux Symposium and I do recommend you read through the couple paragraphs on SMP IRQ Affinity because it will convince you more effectively than this post.

Why?

Recall each processor has its own cache aside from being able to access main memory, check out this diagram. When an interrupt is triggered, a CPU core will have to fetch the instructions to handle the interrupt from main memory, which takes much longer than if the instructions where in the cache. Once a processor executed a task it will have those instructions in the cache. Now say the same CPU core handles the same interrupt almost all the time, the interrupt handler function will unlikely leave the CPU core cache, boosting the kernel performance.

Alternatively, when IRQ is balanced it can assign the interruption to be handled constantly by different CPU, then the new CPU core probably will not have the interrupt handler function in the cache, and a long time will be required to get the proper handler from main memory.

Exception: if you are seldom using eth1 interrupt, meaning enough time passes that the cache is overwritten by doing other tasks, meaning you have data coming over that interface intermittently with long periods in between...then you most likely will not see these benefits for they are when you use a process at high frequency.

Conclusion

If your interrupt occurs very frequently then just bind that interrupt to be handled by a specific CPU only. This configuration lives at

 /proc/'IRQ number'/smp_affinity

or

/proc/irq/'IRQ number'/smp_affinity

See the last paragraph in the SMP IRQ Affinity section from the source linked above, it has instructions.

Alternatively

You can change the frequency that the interrupt flag is raised by either increasing the MTU size (jumbo frames) if the network allows for it or change to have the flag raised after a larger amount of packets are received instead of at every packet OR change the time out, so raise interrupt after a certain amount of time. Caution with the time option because your buffer size might be full before time runs out. This can be done using the ethtool which is outlined in the linked source.

this answer is approaching the length at which people wont read it so I will not go into much detail, but depending on your situation there are many solutions... check the source :)