High load average due to high system cpu load (%sys)
We have server with high traffic website. Recently we moved from
2 x 4 core server (8 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 5.x, to
2 x 4 core server (16 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 6.3
Server running nginx as a proxy, mysql server and sphinx-search.
Traffic is high, but mysql and sphinx-search databases are relatively small, and usually everything works blazing fast.
Today server experienced load average of 100++. Looking at top and sar, we noticed that (%sys) is very high - 50 to 70%. Disk utilization was less 1%. We tried to reboot, but problem existed after the reboot. At any moment server had at least 3-4 GB free RAM.
Only message shown by dmesg was "possible SYN flooding on port 80. Sending cookies.".
Here is snippet of sar
11:00:01 CPU %user %nice %system %iowait %steal %idle
11:10:01 all 21.60 0.00 66.38 0.03 0.00 11.99
We know that this is traffic issue, but we do not know how to proceed future and where to check for solution.
Is there a way we can find where exactly those "66.38%" are used.
Any suggestions would be appreciated.
update: Today load average is "normal" and "sys%" is OK too ~4%. However today's traffic is about 20-30% less than yesterday. This makes me think yesterdays problem is because of some kernel setting for TCP.
I would install atop from EPEL repository. Atop should help you show diagnose what is causing the %sys activity.
Atop also has a atop -r feature that will allow you to step through logs backward and fordward in time using t/T keys.
Also take a look at /proc/interrupts and through your /var/log/httpd/logs and sort those by ip to see if there is any suspect IP causing abnormal amounts of httpd traffic.
I would cron a cat /proc/interrupts to a log file. Look for high deltas in the interupts.