Should I be concerned that swap is being used on a host with nearly 40GB of free memory?

Solution 1:

This is not a problem and is likely normal. Lots of code (and possibly data) is used very rarely so the system will swap it out to free up memory.

Swapping is mostly only a problem if memory is being swapped in and out continuously. It is that kind of activity that kills performance and suggests a problem elsewhere on the system.

If you want to monitor your swap activity you can with several utilities but vmstat is usually quite useful e.g.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 348256  73540 274600    0    0     1     9    9    6  2  0 98  0  0
 0  0      0 348240  73544 274620    0    0     0    16   28   26  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   29   33  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   21   23  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   24   26  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   23   23  0  0 100  0  0

Ignore the first line as that is activity since the system started. Note the si and so columns under ---swap--; they should generally be fairly small figures if not 0 for the majority of the time.

Also worth mentioning is that this preemptive swapping can be controlled with a kernel setting. The file at /proc/sys/vm/swappiness contains a number between 0 and 100 that tells the kernel how aggressively to swap out memory. Cat the file to see what this is set to. By default, most Linux distros default this to 60, but if you don't want to see any swapping before memory is exhausted, echo a 0 into the file like this:

echo 0 >/proc/sys/vm/swappiness

This can be made permanent by adding

vm.swappiness = 0

to /etc/sysctl.conf.

Solution 2:

Linux will pre-emptively write out pages to disk if it has nothing better to do. That does not mean that it will evict those pages from memory, though. It's just that in case it must evict those pages sometime in the future, it doesn't need to wait for them to be written to disk, because they are already there.

After all, the reason you are running out of memory, is probably because your machine is working hard already, you don't want to additionally burden it with swapping. Better to do the swapping when the machine is doing nothing.

For a similar reason, your memory should always be full. Memory pages, filesystem cache, tmpfs, there's so much stuff that could be held in memory. Really, you should be concerned if your memory is empty; after all, you paid a lot of money for it (at least compared to the same amount of disk space), so it better be used!

Solution 3:

Swap used is not bad, but a lot of swap activity is

  vmstat 1
  procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  6  0 521040 114564   6688 377308    8   13   639   173    0 1100  5  4 90  0
  1  0 521040 114964   6688 377448    0    0   256     0    0 1826  3  4 94  0
  0  0 521040 115956   6688 377448    0    0     0     0    0 1182  7  3 90  0
  0  0 521036 115992   6688 377448    4    0    16     0    0 1154 10  2 88  0
  3  0 521036 114628   6696 377640    0    0   928   224    0 1503 15 17 67  1

The column swapd is no problem at all. Non zero values on the columns si and so are deadly to server performance. Specially the ones with lots of RAM.

It is best to disable swapinness on machines with several GB of ram:

sysctl -w vm.swappiness=0

This will not disable the swap. It will only instruct Linux to use the swap as last resort measure. This will waste a few MB of programs that does not need to be in RAM... But is preferable to swap bloating your disk access queues.

Edit 1: why the default value of swappiness is not optimal

We got to remember two decades ago a big 486 had only 32Mb RAM. Swap algorithms was developed when the whole RAM could be moved to the disk in a small fraction of second. Even with the slower disks of that time. That is why default swap policies are so aggressive. RAM was the bottleneck those days. Since then RAM size increased more than 10,000 times and disk speeds less than 10 times. This shifted the bottleneck to disk bandwidth.

Edit 2: why si so activity is deadly to servers?

Si and so activity on machines with tons of RAM is deadly because means the system is fighting with itself for RAM. What happens is that disks, even big storages are too slow when compared to RAMs. Aggressive swap favors kernel disk cache over application data and is the most common source of fighting for RAM. Since the OS will have to free disk cache on every si, the time to live of the extra cache that swap provides is too low to be useful anyways. The result is that you are taking disk bandwidth to store cache that is probably will not be used and pausing your programs waiting for the si pages. Meaning that consumes a lot of critical resources with little or no benefit to the applications.

Note the title of the response "a lot of swap activity on servers with lots of RAM". This does not applies to machines with occasional si and so activity. This may not apply in the future if smarter swap algorithms are developed in the OSs.

Edit 3: "cold" pages

People romanticize the swapping algorithm. Some say "it takes less used pages of the RAM", but this is not what the kernel does at all. The thing is difficult to understand about swap is the kernel does not know what a "cold page" is. The kernel does not have a good metric to determine if the page is used or likely to be used in the near future. To circumvent that the kernel puts pages in the swap more or less randomly and pages that are not needed stays there. The problem of that algorithm is that pages need to go to the swap to know if they are needed by the applications. And this mean a lot of "hot" pages will go to the swap. The problem with that is disks are too damn slow compared to RAM. The consequence of that is when swapping starts all applications get random pauses waiting for the disks and this hinders on latency and throughput.

I built my own benchmark that is a realistic scenario very common to many applications with a decent volume. From my tests, I saw no benefits on throughput or latency when swaps are in use. Far from it. When swapping starts it slow down both throughput and latency by at least a order of magnitude.

I go a bit further about this: I understand swap is not for processing. Swaps are for emergencies only. Those moments when too much applications are running at the same time and you get a memory spike. Without swap this would cause out-of-memory errors. I consider swap usage a failure of the development and production teams. This is just an opinion that goes way beyond what we discussed here, but is what I think. Of course my applications have excellent memory management by themselves.

Solution 4:

This is not an answer for your question; but rather, just extra information to help you make an informed decision.

If you would like to know what processes specifically are using how much swap, here is a little shell script:

#!/bin/bash

set -o posix
set -u

OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"` ; do
  PID=`echo $DIR | cut -d / -f 3`
  PROGNAME=`ps -p $PID -o comm --no-headers`

  SUM=0
  for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'` ; do
    let SUM=$SUM+$SWAP
  done
  echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"

  let OVERALL=$OVERALL+$SUM
done
echo "Overall swap used: $OVERALL"

I should also add that tmpfs will also swap out. This is more common on modern linux systems using systemd that create user-space /tmp overlays using tmpfs.