Nasty CPU spikes that aren't connected to any visible processes

"the CPU gets to 80-90% busy across all cores for about 5 minutes"

That much usage would possibly enable you to pinpoint the culprit by using pidstat available in the sysstat package.

Simply run pidstat -u | sort -nr -k 7,7 | head -10 and the process that used the most CPU should be the top line.


I would try to find the cause for the problem with some shell script:

#!/bin/sh
MAXLOAD=100
CURRLOAD=`uptime | sed 's@.*load average: \([^,]*\).*@\1@' | sed 's@0\?.0\?@@'`

if [ $CURRLOAD -gt $MAXLOAD ]; then                                             
  ps -eo tid,pcpu,comm | sort -n -k 2 | tail -n 5 | \
    mail -s "High load" -e [email protected]
fi

The script has two variables MAXLOAD and CURRLOAD. The first one should be a high load multiplied by 100. So if you encounter a spike and see the system load going up to 2 or 3, than you should set MAXLOAD to some value around 200. $CURRLOAD takes the output of uptime, looks for the load and removes the dot as well as leading zeros.

If the load at some point is to high it prints out the five processes with the most CPU utilisation and send them to [email protected].

This script should help you to find the reason for a spike and if you know it you maybe can resolve your issue.