kswap using 100% of CPU even with 100GB+ of RAM available

I'm running a Centos 7 ESXi VM with almost 300GB of RAM and 24 vCPUs.

The average load is 3 and apps almost never use more than 150GB of RAM. The rest of the available memory is used by Linux for cache.

The issue is that when the cache fills up available RAM, two kswapd processes will start using 100% of CPU and suddenly I see that all CPUs are also showing 99% of sys usage ( it's not wait or user, it's mainly sys ).

This will cause a high load ( 100+ ) for several minutes until the system recovers and load goes down to 3 again.

At this moment I don't have a swap partition, but even when I had one this issue happened.

One "solution" that I found is to execute the following command every day:

 echo 3 > /proc/sys/vm/drop_caches

which drops buffers/caches. This will "fix" the issue since cache usage never reaches 100%.

My questions are:

  • Is there a real solution for this issue?

  • Shouldn't linux kernel be smart enough to simply clear old cache pages from memory instead of launching kswap?

After all, from what I understand the main function of RAM memory is to be used by apps. Caching is just a secondary function which can be discarded/ignored if you don't have enough memory.

My kernel version is 3.10.0-229.14.1.el7.x86_64.


Solution 1:

This sounds like you are running out of RAM on one NUMA node, and the system is thrashing trying to free up memory on that node. This can happen if you have a single process using large amounts of memory, which (by default) is allocated preferentially on the node the process is running on.

See if this helps:

sysctl -w vm.zone_reclaim_mode=0

For a lengthier description of the problems that can arise with the default NUMA policy on most systems, see https://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases