How do I tell what process is causing kswapd to be in use?

I see kswapd using 100% CPU... how can I tell on which process's behalf kswapd is being used so much?


Solution 1:

kswapd is managing swap space in response to memory demands greater than physically available for all processes.

It is process agnostic, it is only interested in what pages are access and when (it is more complex than this of course but to keep things simple we may as well view it this way).

So the real question is "what processes have the greatest burden on memory that are causing kswapd to need to page all the time".

That is most easily answered using 'top' and switching to memory usage sort mode.

Solution 2:

You can script it.. but you can also do it via top

Run top then press O followed by p then enter

Now all the processes are sorted by swap usage and you can see which ones are using it

Solution 3:

If you're on Ubuntu 15.10 or greater, this may actually be the result of a bug, especially if your system is a virtual machine lacking a swap partition (e.g., AWS EC2). The problem exists on other distributions, but, as of writing, it's unclear if the same fix works universally.

A temporary workaround:

sudo ln -s /dev/null /etc/udev/rules.d/40-vm-hotadd.rules
sudo reboot

Note that this will disable hotadding RAM/CPUs for Xen and Hyper-V virtual machines.

Solution 4:

There also seems to be a bug in kswapd somewhere, hopefully only on older kernels.

Nearly each day now kswapd goes beserk randomly on some machines in a bigger cluster (with a non-current kernel, though). 100% CPU on both kswapd processes. No other running processes (except ssh shell), plenty of free RAM (more than 700 MB) and no SWAP used at all. No swapin, no swapout as well.

Nothing explains yet, why a particular machine is hit and another is not. It seems not to be completely random, because it usually hits more than one machine within a short time span. It looks like machines, which are idle, as well as machines, which are under high pressure, are less(!) likely hit by the effect. So it has to do something with the work load and only hits if the machine is neither idle nor very busy.

If the problem strikes nothing helps anymore. Killing all processes (which did not become unkillable), unmounting all filesystems, nothing. kswapd still stays at 100% CPU. I suspect some spinlock race in SMP kernels, but it's also likely that I am wrong.

Perhaps see my answer serverfault.com/questions/316995/#493257

Notes:

  • Rebooting affected machines often fails because the shutdown process starts hanging somewhere.
  • There is no direct connection to the Internet. Foreign causes are unlikely.
  • It seems to depend on the type of workload the machines processes from a load's perspective, because we have machines which never were affected (yet).
  • Sorry, I cannot be more specific on what we do and why.
  • Yes, I am speculating. Because it's an extremely puzzling effect, today.