Random high CPU usage on Linux using Apache

The CPU usage from Apache's server-status page is the average usage since Apache was started so it won't show spikes like this. When you get these load spikes you can check the server-status page to see what pages/clients are being server (ExtendedStatus must be on).

You can also use netstat to see what clients are currently accessing your machine:

 netstat -an | grep ESTABLISHED

If you run this over multiple hours and traffic spikes you may be able to spot a reoccuring IP address and potentially trace to a specific robot/crawler. If this does turn out to be the case you can look into using robots.txt to limit how well behaving robots should crawl your site.

Edit: On a busy server the above netstat command should show some entries like:

tcp        0      0 10.2.212.13:80              216.146.52.21:24979         ESTABLISHED
tcp        0      0 10.2.212.13:80              86.174.113.138:54901        ESTABLISHED
tcp        0      0 10.2.212.13:80              94.1.216.253:51204          ESTABLISHED
tcp        0      0 10.2.212.13:80              24.9.61.204:62936           ESTABLISHED

The client's IP address will be the one on the right. If you only see 1 or 2 lines it just means that at that moment there is just your ssh connection. Check again when your load increases. You can also remove the grep to list all connections although this will include a large number of old TIME_WAIT.

I would start with the extended server-status and see if that can reveal any obvious crawlers during traffic peaks.


Check your access logs. You may have a dataminer or crawler hitting every page on your site, since the interval is so regular.


Create an simple executable file:

#!/usr/bin/sh
# use IP
 netstat -na |grep ESTABLISHED
# use NAMES
# netstat -ta |grep ESTABLISHED

The "ta" will print out DNS names so uncomment out which one you prefer.

Then put into a program that runs files on an interval, such as a crontab. I would read the man page for this, you may not be able to even use it. You will want to send the output to a log for future use. You can add a date command to the script if you prefer to note down the time ran. Example of crontab:

#minute hour dayofmonth monthofyear dayofweek
0,15,30,45 * * * * <path/to/script> > <log>

This is edited with crontab -e (again read the man page).

You can use this to sort the top entries in your access.log:

awk '{print $1}' access_log | sort |uniq -c |sort -n

If you are really seeing a slow response with the webpage, look into the I/O wait, sometimes the CPU use being "high" isn't a big deal.


I suspect that you are running on a quad core CPU. In that case you could easily end up in a situation where top will return the % as a load per core, whereas other tools would divide this figure by the number of cores to arrive at an overall load figure for the CPU in total.

As far as the variations are concerned, I am inclined to give the same advice as Hyppy.