Kernel Processes Periodically Eating CPU During High Load
I run a production web server with 24 cores in which the work is both CPU and I/O intensive, but mostly CPU. My scripts delay execution when total CPU load is ~85% or higher in order to keep the load manageable. Thus, the CPU is never under greater stress than my scripts know it can handle.
Now, my server undergoes max capacity production in time blocks up to 3 hours long at a time. Much of the time the work goes smoothly, but in the middle of this period, often the CPU system load increases dramatically. This is due to the kernel processes "events/x", "migration/x", and "ksoftirqd/x" where "x" is the CPU number for that process. I have read that this indicates the kernel is struggling with queued tasks, which occurs under overwhelming system load. However, my CPU load, which is the primary bottleneck, is deliberately kept at ~85% to avoid this kind of problem, as I mentioned. This kernel usage of CPU dramatically slows production and only prolongs the queued tasks. The weird part is that, after about 30 minutes, the system load will disappear, with the kernel processes decreasing back to zero CPU usage, only to begin hogging the CPU again later. During this entire time, the amount of work being fed to the CPUs has not changed, and is usually handled just fine. However, when these kernel processes kick in, it completely kills production.
Here is the output of "top -u root" during one of these events. The user CPU usage is 49% because the system usage is 40%. Normally this should be user ~85%, system ~5%. However, there is no iowait, and the system load average is 22 (out of 24 cores), which is normal.
top - 13:10:49 up 44 days, 20:29, 1 user, load average: 22.87, 22.73, 21.36
Tasks: 622 total, 24 running, 585 sleeping, 0 stopped, 13 zombie
Cpu(s): 49.4%us, 40.3%sy, 0.0%ni, 10.1%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 32728060k total, 31045092k used, 1682968k free, 353768k buffers
Swap: 4194300k total, 243136k used, 3951164k free, 19117436k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 51 root RT 0 0 0 0 S 11.1 0.0 436:03.06 migration/12 100 root 20 0 0 0 0 S 9.5 0.0 49:19.45 events/1 114 root 20 0 0 0 0 S 5.9 0.0 48:14.75 events/15 3 root RT 0 0 0 0 S 4.3 0.0 517:58.05 migration/0 112 root 20 0 0 0 0 S 3.6 0.0 42:00.54 events/13 27 root RT 0 0 0 0 S 2.3 0.0 200:59.58 migration/6 8149 root 20 0 165m 7732 3928 S 2.3 0.0 0:00.07 exim 15 root RT 0 0 0 0 S 2.0 0.0 450:05.62 migration/3 39 root RT 0 0 0 0 S 2.0 0.0 178:08.17 migration/9 113 root 20 0 0 0 0 S 1.6 0.0 44:00.04 events/14 178 root 20 0 0 0 0 R 1.6 0.0 53:27.57 kacpid 63 root RT 0 0 0 0 S 1.3 0.0 439:11.96 migration/15 81 root 20 0 0 0 0 S 1.0 0.0 17:14.83 ksoftirqd/19 104 root 20 0 0 0 0 S 1.0 0.0 44:58.55 events/5 115 root 20 0 0 0 0 S 1.0 0.0 47:18.46 events/16 9 root 20 0 0 0 0 S 0.7 0.0 13:56.20 ksoftirqd/1 25 root 20 0 0 0 0 S 0.7 0.0 12:46.52 ksoftirqd/5 57 root 20 0 0 0 0 S 0.7 0.0 11:12.62 ksoftirqd/13 75 root RT 0 0 0 0 S 0.7 0.0 181:00.24 migration/18 118 root 20 0 0 0 0 S 0.7 0.0 30:13.06 events/19 10497 root 20 0 77964 6244 4096 S 0.7 0.0 17:40.25 httpd
Are there any potential explanations for the behavior of these processes when CPU load is strictly regulated to be manageable? Memory is not a problem, as the usage of buffers/cache is never above 30% system capacity. In searching the web, everyone blames overwhelming system load, but my server's behavior does not suggest that the used resources should cause this lock-up.
Any suggestions would be appreciated.
EDIT: I've posted what seems to be the solution in the answers section.
Solution 1:
It appears that the kernel processes may have been stealing CPU time during transfers to/from swap. The server's cache settings had somehow been reset without my knowledge, setting swappiness to 60. From the output of "sar -W", the hang-ups seemed to coincide with high load periods during which pswpin/s and pswpout/s were large (greater than 2.00 or so, sometimes as high as 15.00). After setting swappiness to 1, I have not come across the same hang-ups from the kernel processes, and sar -W shows near-zero values at all times. In summary, it appears that aggressive swapping during high load with large memory transfers was bogging down the system during times of large and rapidly changing demand for resources.
Solution 2:
migration
is the kernel process that handles moving processes from one CPU to another.
So, for some reason your Linux scheduler decides that processes need moving to another CPU, and the migration process eats the CPU time.
You could try pinning processes to specific CPUs, or try different schedulers with your kernel. Maybe some other scheduler isn't so eager in migrating processes to other CPUs.
Solution 3:
I've tracked issues with migration kernel process reported here. It seems that linux kernel prior to 3.6.11 are affected. The link show similar symptom where migration process take large CPU time. If possible, you might want to upgrade the kernel to see if the problem persists.