kvm and qemu host: Is there a limit for max CPUs (Ubuntu 10.04)?

Today we encountered a really strange behaviour on two identical kvm and qemu hosts (Dell R910). The host systems each have 4 x 10 Cores, which means that 40 physical cores are displayed as 80 within the operating system (Ubuntu Linux 10.04 64 Bit, Kernel 3.0).

We started a Windows 2003 32 Bit VM (1 CPU, 1 GB RAM, we changed those values multiple times) on one of the nodes and noticed that it took 15 minutes until the boot process began. During those 15 minutes, a black screen is shown and nothing happens. libvirt and the host system show that the qemu-kvm process for the guest is almost idling. stracing this process only shows some FUTEX entries, but nothing special.

After those 15 minutes, the Windows VM suddenly starts booting and the Windows logo occurs. After a few seconds, the VM is ready to be used. The VM itself is very performant, so this is no performance issue.

We tried to pin the CPUs with the virsh and taskset tools, but this only made things worse.

When we boot the Windows VM with a Linux Live CD there is also a black screen for several minutes, but not as long as 15. When booting another VM on this host (Ubuntu 10.04) it also has the black screen problem, and also here the black screen is only shown for 2-3 minutes (instead of 15).

So, summerinzing this: Each guest on each of those identical nodes suffers from idling a few minutes after being started. After a few minutes, the boot process suddenly starts. We have observed that the idling time happens right after the bios of the guest was initialized.

One of our employees had the idea to limit the amount of CPUs with maxcpus=40 (because of 40 physical cores existing) within Grub (kernel parameter) and suddenly the "black-screen-idling"-behaviour disappeared.

Searching the KVM and Qemu mailing lists, the internet, forums, serverfault and other various sites for known bugs etc. showed no useful results. Even asking in the dev IRC channels brought no new ideas. The people there recommend us to use CPU pinning, but as stated before it didn't help.

My question is now: Is there a sort of limit of CPUs for a qemu or kvm host system? Browsing the source code of those two tools showed that KVM would send a warning if your host has more than 255 CPUs. But we are not even scratching on that limit.

Some stuff about the host system:

3.0.0-20-server
kvm 1:84+dfsg-0ubuntu16+0.14.0+noroms+0ubuntu4
kvm-pxe 5.4.4-7ubuntu2
qemu-kvm 0.14.0+noroms-0ubuntu4
qemu-common 0.14.0+noroms-0ubuntu4
libvirt 0.8.8-1ubuntu6
4 x Intel(R) Xeon(R) CPU E7-4870  @ 2.40GHz, 10 Cores

Edit: Also tried the 3.2 kernel (with maxcpus parameter not being used) - unfortunately this made things worse. dstat shows a rising amount of context swiches:

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   0   0   0|1164k  638k|   0     0 |   0     0 |4972  6319 
  0   1  99   0   0   0|   0     0 |3456B 4847B|   0     0 |  18k   33k
  0   1  99   0   0   0|   0     0 |6126B 4550B|   0     0 |  17k   33k
  0   1  99   0   0   0|   0     0 |1772B 4139B|   0     0 |  17k   33k
  0   1  99   0   0   0|   0     0 |5507B 3674B|   0     0 |  17k   32k

Normal values would be around 7000 for this system with one VM up.

Edit: I started the host systems with maxcpus=40 as a boot parameter. virsh nodeinfo shows 40 physical cores and no hyperthreaded ones.

When starting the virtual machine, it still has a "boot break" of around 30 seconds. During that thime, the amount of context switches rises from 300 (per second) to 600 000 (per second). After 30 seconds of black screen, the VM starts the normal boot process and the context switches go down to <7000 a second:

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  1   2  97   0   0   0| 943k    0 |  26k   12k|   0     0 |  22k   40k
  3   7  84   6   0   0|  26M   64k|  71k   18k|   0     0 |  10k   16k
  1   1  97   1   0   0|5282k 2560B|9751B   15k|   0     0 |  13k   23k
  1   4  95   0   0   0|1216k    0 |  14k   18k|   0     0 | 295k  592k
  1   3  96   0   0   0|   0    52k|5518B 7299B|   0     0 | 228k  456k
  1   3  96   0   0   0|  12k   24k|1228B 1514B|   0     0 | 258k  518k
  1   4  96   0   0   0|   0     0 |  14k   32k|   0     0 | 280k  565k
  1   3  96   0   0   0|   0     0 |  19k   38k|   0     0 | 284k  573k
  1   3  96   0   0   0|   0     0 |6465B 7203B|   0     0 | 288k  581k
  1   3  96   0   0   0|   0   172k|  26k   11k|   0     0 | 290k  584k
  1   3  96   0   0   0|   0     0 |  23k   11k|   0     0 | 288k  580k
  1   3  96   0   0   0|   0    12k|5678B 4556B|   0     0 | 289k  583k
  1   3  96   0   0   0|   0     0 |1192B 2929B|   0     0 | 288k  580k
  1   3  96   0   0   0|   0     0 |6304B   10k|   0     0 | 272k  547k
  1   3  96   0   0   0|4096B   52k|8330B   14k|   0     0 | 300k  605k
  1   3  96   0   0   0|   0    24k|  11k   20k|   0     0 | 293k  591k
  1   3  96   0   0   0|   0     0 |  13k   28k|   0     0 | 291k  587k
  1   3  96   0   0   0|   0   512B|  10k   18k|   0     0 | 291k  587k
  2   3  95   0   0   0|   0     0 |6653B   10k|   0     0 | 167k  337k
  3   0  97   0   0   0|   0   160k|  23k 5524B|   0     0 |  10k   19k
  7   0  92   0   0   0|   0    36k|  22k 3335B|   0     0 | 949   924 
 10   0  90   0   0   0|   0     0 |5172B 3318B|   0     0 | 908   923 
  5   0  94   0   0   0|   0     0 |2234B 2825B|   0     0 | 846   875 

Edit: As requested, I'll add an excerpt of the strace -f -p :

25734 <... read resumed> "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0"..., 128) = 128
25752 futex(0x927e60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
25734 rt_sigaction(SIGALRM, NULL, {0x4b2300, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7fe09ac108f0}, 8) = 0
25734 write(8, "\1\0\0\0\0\0\0\0", 8)   = 8
25734 read(15, 0x7fffcea69f70, 128)     = -1 EAGAIN (Resource temporarily unavailable)
25734 timer_gettime(0x1, {it_interval={0, 0}, it_value={0, 0}}) = 0
25734 timer_settime(0x1, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
25734 timer_gettime(0x1, {it_interval={0, 0}, it_value={0, 182592}}) = 0
25734 futex(0x927e60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
25752 <... futex resumed> )             = 0
25734 <... futex resumed> )             = 1
25752 futex(0x927e60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
25734 select(25, [7 10 14 15 16 17 18 24], [], [], {1, 0} <unfinished ...>

As recommend in one of the comments (thanks cperrin88), Ubuntu 12.04 brought the solution. Some parameters:

  • Kernel 3.2
  • 80 cores (40 physical, 80 because of Intel HT)
  • kvm 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu13
  • kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1
  • qemu-kvm 1.0+noroms-0ubuntu13
  • libvirt 0.9.8-2ubuntu17.1

The Windows guest now shows a boot bar during the first 30 seconds of the boot and then just boots (normal behaviour).

The amount of context switches is now very low compared to the test scenarious I've had earlier (between 200 and 24k per second).

So, problem solved. I just need to find out what changed (I guess it was a bug in KVM).

Thanks to all the comments and your efforts!


I have encountered quite a few bugs with KVM on Ubuntu 10.04. (which I still have to use) including growing caches that get swapped and serious performance issues.

I recommend upgrading to the latest LTS release in hope it will fix a few bugs.