Centos server not using SWAP properly and getting OOM

Recently I've been having some serious memory issues with my server. Just the other day, my server became completely unresponsive, and oom-killer started killing services at random (httpd, php, etc). I couldn't even SSH into my server, but I was able to PING it.

I did look at the kernel messages log, but there wasn't any clear indication as to what was causing the memory problem - all I could see was all the oom-killer messages.

sar -r command:

03/15/2012

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
12:10:01 AM   2881812    582380     16.81     26652    250192   4192944         0      0.00         0
12:20:01 AM   2883600    580592     16.76     27104    250196   4192944         0      0.00         0
12:30:01 AM   2878576    585616     16.90     27656    250320   4192944         0      0.00         0
12:40:01 AM   2851856    612336     17.68     28312    271540   4192944         0      0.00         0
12:50:01 AM   2843560    620632     17.92     28968    274468   4192944         0      0.00         0
01:00:01 AM   2843892    620300     17.91     29440    274644   4192944         0      0.00         0
01:10:01 AM     22868   3441324     99.34     60764   2947884   4192936         8      0.00         8
01:20:01 AM     13836   3450356     99.60     62064   2882544   4192844       100      0.00        92
01:30:03 AM     14024   3450168     99.60      7820   3040976   4192844       100      0.00         0
01:40:01 AM     18600   3445592     99.46     18720   3039152   4192844       100      0.00         0
01:50:01 AM     25352   3438840     99.27     20048   3034584   4192844       100      0.00         0
02:00:01 AM     22572   3441620     99.35     20872   3036896   4192844       100      0.00         0
02:10:01 AM     21408   3442784     99.38     21776   3038236   4192844       100      0.00         0
02:20:01 AM     23240   3440952     99.33     23168   3032372   4192844       100      0.00         0
02:30:01 AM     72392   3391800     97.91     25100   2981488   4192844       100      0.00         0
02:40:01 AM     70876   3393316     97.95     25824   2981756   4192844       100      0.00         0
02:50:01 AM     74200   3389992     97.86     26464   2981860   4192844       100      0.00         0
03:00:01 AM     64980   3399212     98.12     32616   2982240   4192844       100      0.00         0
03:10:01 AM     63704   3400488     98.16     33564   2984268   4192844       100      0.00         0
03:20:01 AM     59564   3404628     98.28     34592   2988936   4192844       100      0.00         0
03:30:01 AM     53972   3410220     98.44     35740   2992484   4192844       100      0.00         0
03:40:01 AM     89120   3375072     97.43     36472   2956088   4192844       100      0.00         0
03:50:01 AM     88788   3375404     97.44     36920   2956324   4192844       100      0.00         0
04:00:01 AM     78540   3385652     97.73     37740   2964452   4192844       100      0.00         0
04:10:01 AM     21720   3442472     99.37    106636   2892836   4192844       100      0.00         0
04:20:01 AM     22796   3441396     99.34    107172   2890796   4192844       100      0.00         0
04:30:01 AM     30604   3433588     99.12    107812   2884644   4192844       100      0.00         0
04:40:01 AM     32744   3431448     99.05    108568   2875944   4192844       100      0.00         0

Here is top sorted by swapped size:

top - 14:32:01 up 15:37,  1 user,  load average: 0.10, 0.10, 0.04
Tasks: 110 total,   3 running, 107 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.3%sy,  0.0%ni, 98.4%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3464192k total,  2663384k used,   800808k free,   140796k buffers
Swap:  4192944k total,      100k used,  4192844k free,  2073748k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP COMMAND
 1975 mysql     15   0  222m  43m 4652 S  0.0  1.3   0:11.82 178m mysqld
 1859 named     22   0  161m 5228 1948 S  0.0  0.2   0:00.04 156m named
 2144 root      18   0  143m  47m 1072 S  0.0  1.4   0:00.00  95m spamd
 2119 root      15   0  143m  49m 2628 S  0.0  1.5   0:01.17  94m spamd
 2161 root      15   0 93372 1280  936 S  0.0  0.0   0:00.01  89m pure-ftpd
 2163 root      18   0 91016  976  804 S  0.0  0.0   0:00.01  87m pure-authd
20035 root      15   0 91800 3096 2408 S  0.0  0.1   0:00.00  86m sshd
19432 root      15   0 92232 3656 2900 R  0.0  0.1   0:00.00  86m sshd
 2377 root      19   0 93268  14m 1940 S  0.0  0.4   0:00.00  76m cpdavd
 2380 root      15   0 87824  11m 1520 S  0.0  0.3   0:00.07  74m cpsrvd-ssl
 3115 root      15   0 74832 1168  584 S  0.0  0.0   0:00.05  71m crond
18548 root      18   0 73624 3036  236 S  0.0  0.1   0:00.00  68m httpd
19713 nobody    18   0 73760 4460 1584 S  0.0  0.1   0:00.00  67m httpd
19712 nobody    15   0 73760 4484 1584 S  0.0  0.1   0:00.00  67m httpd
19709 nobody    18   0 73624 4460 1584 S  0.0  0.1   0:00.00  67m httpd
19508 nobody    15   0 73760 4600 1680 S  0.0  0.1   0:00.00  67m httpd
19162 nobody    15   0 73756 4640 1708 S  0.0  0.1   0:00.01  67m httpd
19154 nobody    15   0 73756 4656 1728 S  0.0  0.1   0:00.00  67m httpd
19157 nobody    15   0 73756 4696 1740 S  0.0  0.1   0:00.01  67m httpd
19327 nobody    15   0 73756 4700 1740 S  0.0  0.1   0:00.01  67m httpd
19163 nobody    15   0 73756 4768 1836 S  0.0  0.1   0:00.00  67m httpd
19164 nobody    15   0 73756 4788 1856 S  0.0  0.1   0:00.00  67m httpd
 2145 root      18   0 73624 5740 2940 S  0.0  0.2   0:00.60  66m httpd
 1911 root      20   0 65952 1276 1044 S  0.0  0.0   0:00.01  63m mysqld_safe

For some reason, it says that it's only using 100k SWAP, but that doesn't make any sense. Isn't VIRT the amount of SWAP being used by each process?

* Update *

Here is some more information on the file systems:

# df -T
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/md2      ext3   468924192  17215692 427504176   4% /
/dev/md1      ext3     2030672     58788   1867068   4% /tmp
/dev/md0      ext3      101018     13414     82388  15% /boot
tmpfs        tmpfs     1732096         0   1732096   0% /dev/shm

* Update 2 *

Here is the free -m that I managed to run when the server was in this OOM state, yesterday:

             total       used       free     shared    buffers     cached
Mem:          3383       3372         10          0          0          6
-/+ buffers/cache:       3365         17
Swap:         4094       4094          0

Solution 1:

I usually sort by memory ("M" in top) to troubleshoot these kinds of things--that shows you the amount of real memory that each process is using (and touching frequently enough to keep it off the least-recently-used queue for being swapped).

VIRT = RES + SWAP

Another thing to check is whether /tmp is a tmpfs file system and if something is writing a lot of data there.

I am actually a little confused by what I'm seeing. Is this sar output over the interval when your outage occurred or just the default output? And the top output is from a totally different time, 14:32?

Also, it's not really using swap at the time you took these stats because it doesn't need to--nearly 3G of your memory is currently being used as disk cache ("kbcached") and you only have kbmemused - kbcached + kbbuffers = 664072KiB (648MiB) [at 04:40:01] in use by actual processes.

Because no process image is using much memory itself but yet the oom-killer started, then I would guess that something started performing a lot of file I/O and started dirtying pages faster than could be written to disk. I'm not really sure that should trigger the oom-killer though.

None of these dirty pages would go to swap, because it's about as easy to write the content of the file itself out as it is to write the data to swap.

The obvious guess is that mysqld was doing this, although I would suspect that it would open its files with O_DIRECT, which suggests to the kernel to minimize effects on the cache (with the premise that the DB server is doing its own caching).

Update

Based on your free output from update #2, the answer to the question in your topic is that it's using swap just fine; something just used all of it. The other data you provided is normal for a system that has recently boot.

Update 2

I mentioned mysql below, but I would be surprised is that's the culprit, honestly. I would suspect spamd, the CPanel processes or web applications running within Apache first.

I have also been assuming that you're running a reasonably current distro without any tweaking of system tunables and that you're current on security patches. There was a BIND exploit in the last few months that resulted in a DoS but I cannot recall if the exploit triggered memory exhaustion or something else. I have also read of CPanel exploits recently, but I don't know how current those were.