Is Committed_AS in /proc/meminfo really the correct number for allocated virtual memory in Linux? It is less than used/RSS

I am collecting numbers for monitoring HPC servers and am debating the policy for handing out memory (overcommit or not). I wanted to show users a number on how much virtual memory their processes (the whole machine) requested vs. how much was actually used.

I thought I'd get the interesting values from /proc/meminfo using the fields MemTotal, MemAvailable, and Committed_AS. The latter is supposed to show how much memory has been committed to by the kernel, a worst-case number of how much memory would really be needed to fulfill the running tasks.

But Committed_AS is obviously too small. It is smaller than the currently used memory! Observe two example systems. One admin server:

# cat /proc/meminfo 
MemTotal:       16322624 kB
MemFree:          536520 kB
MemAvailable:   13853216 kB
Buffers:             156 kB
Cached:          9824132 kB
SwapCached:            0 kB
Active:          4854772 kB
Inactive:        5386896 kB
Active(anon):      33468 kB
Inactive(anon):   412616 kB
Active(file):    4821304 kB
Inactive(file):  4974280 kB
Unevictable:       10948 kB
Mlocked:           10948 kB
SwapTotal:      16777212 kB
SwapFree:       16777212 kB
Dirty:               884 kB
Writeback:             0 kB
AnonPages:        428460 kB
Mapped:            53236 kB
Shmem:             26336 kB
Slab:            4144888 kB
SReclaimable:    3863416 kB
SUnreclaim:       281472 kB
KernelStack:       12208 kB
PageTables:        38068 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    24938524 kB
Committed_AS:    1488188 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      317176 kB
VmallocChunk:   34358947836 kB
HardwareCorrupted:     0 kB
AnonHugePages:     90112 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      144924 kB
DirectMap2M:     4988928 kB
DirectMap1G:    13631488 kB

This is roughly 1.5G committed vs. 2.5G being in use without caches. A compute node:

ssh node390 cat /proc/meminfo
MemTotal:       264044768 kB
MemFree:        208603740 kB
MemAvailable:   215043512 kB
Buffers:           15500 kB
Cached:           756664 kB
SwapCached:            0 kB
Active:         44890644 kB
Inactive:         734820 kB
Active(anon):   44853608 kB
Inactive(anon):   645100 kB
Active(file):      37036 kB
Inactive(file):    89720 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      134216700 kB
SwapFree:       134216700 kB
Dirty:                 0 kB
Writeback:           140 kB
AnonPages:      44918876 kB
Mapped:            52664 kB
Shmem:            645408 kB
Slab:            7837028 kB
SReclaimable:    7147872 kB
SUnreclaim:       689156 kB
KernelStack:        8192 kB
PageTables:        91528 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    345452512 kB
Committed_AS:   46393904 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      797140 kB
VmallocChunk:   34224733184 kB
HardwareCorrupted:     0 kB
AnonHugePages:  41498624 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      312640 kB
DirectMap2M:     7966720 kB
DirectMap1G:    262144000 kB

This is around 47G used vs. 44G committed. The system at question is a CentOS 7 cluster:

uname-a
Linux adm1 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

On my Linux desktop using a vanilla kernel, I see more 'reasonable' numbers with 32G being committed compared to 15.5G being in use. On a Debian server I see 0.4G in use vs. 1.5G committed.

Can someone explain this to me? How do I get a correct number for the committed memory? Is this a bug in the CentOS/RHEL kernel that should be reported?

Update with more data and a comparison between systems

A listing of used/committed memory for various systems I could access, with a note about the kind of load:

SLES 11.4 (kernel 3.0.101-108.71-default)
- 17.6G/17.4G, interactive multiuser HPC (e.g. MATLAB, GIS)
CentOS 7.4/7.5 (kernel 3.10.0-862.11.6.el7 or 3.10.0-862.14.4.el7)
- 1.7G/1.3G, admin server, cluster mgmt, DHCP, TFTP, rsyslog, …
- 8.6G/1.7G, SLURM batch system, 7.2G RSS for slurmdbd alone
- 5.1G/0.6G, NFS server (400 clients)
- 26.8G/32.6G, 16-core HPC node loaded with 328 (need to talk to the user) GNU R processes
- 6.5G/8.1G, 16-core HPC node with 16 MPI processes
Ubuntu 16.04 (kernel 4.15.0-33-generic)
- 1.3G/2.2G, 6-core HPC node, 6-threaded scientific application (1.1G RSS)
- 19.9G/20.3G, 6-core HPC node, 6-threaded scientific application (19G RSS)
- 1.0G/4.4G, 6-core login node with BeeGFS metadata/mgmt server
Ubuntu 14.04 (kernel 3.13.0-161-generic)
- 0.7G/0.3G, HTTP server VM
Custom build (vanilla kernel 4.4.163)
- 0.7G/0.04G, mostly idle Subversion server
Custom build (vanilla kernel 4.14.30)
- 14.2G/31.4G, long-running desktop
Alpine (kernel 4.4.68-0-grsec)
- 36.8M/16.4M, some (web) server
Ubuntu 12.04 (kernel 3.2.0-89-generic)
- 1.0G/7.1G, some server
Ubuntu 16.04 (kernel 4.4.0-112-generic)
- 0.9G/1.9G, some server
Debian 4.0 (kernel 2.6.18-6-686, 32 bit x86, obviously)
- 1.0G/0.8G, some reliable server
Debian 9.5 (kernel 4.9.0-6)
- 0.4G/1.5G, various web services, light load, obviously
Debian 9.6 (kernel 4.9.0-8-amd64)
- 10.9G/17.7G, a desktop
Ubuntu 13.10 (kernel 3.11.0-26-generic)
- 3.2G/5.4G, an old desktop
Ubuntu 18.04 (kernel 4.15.0-38-generic)
- 6.4G/18.3G, a desktop

SUnreclaim for SLES and CentOS rather large … 0.5G to 1G not uncommon, more if not flushing caches from time to time. But not enough to explain the missing memory in Committed_AS. The Ubuntu machines typically have below 100M SUnreclaim. Except the 14.04 one, that one has small Committed_AS and 0.4G SUnreclaim. Bringing kernels in order is tricky, as the 3.10 kernel from CentOS has many features of 4.x kernels backported. But there seems to be a line between 4.4 and 4.9 that affected the strangely low values of Committed_AS. The added servers from some of my peers suggest that Committed_AS also delivers strange numbers for older kernels. Was this broken and fixed multiple times?

Can people confirm this? Is this just buggy/very inaccurate kernel behaviour in determining the values in /proc/meminfo, or is there a bug(fix) history?

Some of the entries in the list are really strange. Having one slurmdbd process with a RSS of four times Committed_AS cannot be right. I am tempted to test a vanilla kernel on these systems with the same workload, but I cannot take the most interesting machines out of production for such games.

I guess the answer to my question is a pointer to the fix in the kernel commit history that enabled good estimates in Committed_AS again. Otherwise, please enlighten me;-)

Update about a two processes having more RSS than Committed_AS

The batch server that runs an instance of the Slurm database daemon slurmdbd, along with slurmctld is an illuminating example. It is very long-running and shows a stable picture, with those two processes dominating resource use.

# free -k; for p in $(pgrep slurmctld) $(pgrep slurmdbd) ; do cat /proc/$p/smaps|grep Rss| awk '{ print $2}'; done | (sum=0; while read n; do sum=$((sum+n)); done; echo $sum ); cat /proc/meminfo
              total        used        free      shared  buff/cache   available
Mem:       16321148     5873792      380624      304180    10066732     9958140
Swap:      16777212        1024    16776188
4703676
MemTotal:       16321148 kB
MemFree:          379708 kB
MemAvailable:    9957224 kB
Buffers:               0 kB
Cached:          8865800 kB
SwapCached:          184 kB
Active:          7725080 kB
Inactive:        6475796 kB
Active(anon):    4634460 kB
Inactive(anon):  1007132 kB
Active(file):    3090620 kB
Inactive(file):  5468664 kB
Unevictable:       10952 kB
Mlocked:           10952 kB
SwapTotal:      16777212 kB
SwapFree:       16776188 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:       5345868 kB
Mapped:            79092 kB
Shmem:            304180 kB
Slab:            1287396 kB
SReclaimable:    1200932 kB
SUnreclaim:        86464 kB
KernelStack:        5252 kB
PageTables:        19852 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    24937784 kB
Committed_AS:    1964548 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     1814044 kB
DirectMap2M:    14854144 kB
DirectMap1G:     2097152 kB

Here you see the Rss of the two processes amounting to 4.5G (just slurmdbd is 3.2G). The Rss kindof matches the active anon pages, but Committed_AS is less than 2G. Counting the Rss of all processes via /proc comes quite close to AnonPages+shmem (note: Pss is only about 150M smaller). I don't get how Committed_AS can be smaller than Rss (summed Pss) of active processes. Or, just in the context of meminfo:

How can Committed_AS (1964548 kB) be smaller than AnonPages (5345868 kB)? This is a faily stable workload. These are extremely long-lived two processes that are about the only thing that happens on this machine, with rather constant churn (batch jobs on other nodes being managed).

Solution 1:

Those boxes are not under significant memory pressure. Neither is paging (SwapFree). Second box is ~47 GB committed of 250 GB total. 200 GB is a lot to play with.

In practice, keep increasing the size of the workload until one of these happens:

User (application) response time degrades
Page out rate is higher than you are comfortable with
OOM killer murders some processes

Relationships between the memory counters is unintuitive, varies greatly between workloads, and probably is only really understood by kernel developers. Don't worry about it too much, focus on measuring obvious memory pressure.

Other descriptions of Comitted_AS, on the linux-mm list a while ago, emphasize it is an estimate:

Committed_AS: An estimate of how much RAM you would need to make a
              99.99% guarantee that there never is OOM (out of memory)
              for this workload. Normally the kernel will overcommit
              memory. That means, say you do a 1GB malloc, nothing
              happens, really. Only when you start USING that malloc
              memory you will get real memory on demand, and just as
              much as you use. So you sort of take a mortgage and hope
              the bank doesn't go bust. Other cases might include when
              you mmap a file that's shared only when you write to it
              and you get a private copy of that data. While it normally
              is shared between processes. The Committed_AS is a
              guesstimate of how much RAM/swap you would need
              worst-case.

Solution 2:

Here's another answer purely about Committed_AS being lower than "expected":

The interesting lines from your /proc/meminfo are as follows:

Active:          4854772 kB
Inactive:        5386896 kB
Active(anon):      33468 kB
Inactive(anon):   412616 kB
Active(file):    4821304 kB
Inactive(file):  4974280 kB
Mlocked:           10948 kB
AnonPages:        428460 kB
Shmem:             26336 kB
Committed_AS:    1488188 kB

(The Active and Inactive are just sums of the (anon) vs (file) details later, and AnonPages is just sum of lines with identifier (anon) – I only included those lines to make this easier to understand.)

As Active(file) is file backed that doesn't cause any raise to Committed_AS so practically the only things that actually raises your Committed_AS value are AnonPages + Shmem + Mlocked + spikes in memory usage. The Committed_AS is the amount of memory (RAM+swap combined) that system must be able to provide to currently running processes even if all caches and buffers are flushed to disk.

If a process does malloc() (which is usually implemented as sbrk() or brk() behind the scenes) the kernel will increase Committed_AS but it will not show in other numbers because kernel doesn't actually reserve any real RAM until the memory is actually used by the process. (Technically the kernel has specified virtual memory address space range to use for the process but the virtual memory mapping for the CPU is pointing to zero filled page with a flag that if the process tries to write anything, actual memory must be allocated on the fly - this allows process to read zeros from the virtaul address space without faulting the CPU but writing data to virtually allocated memory area is the action that actually allocates the memory for real.) It's very common that programs allocate more (virtual) memory than they actually use so this is a good feature to have but it obviously makes memory statistics harder to understand. It seems that your system is mostly running processes that do not acquire a lot of memory that's not actually used because your Committed_AS is pretty low compared to other values.

For example, my current system is currently running like this:

MemTotal:       32570748 kB
Active:         12571828 kB
AnonPages:       7689584 kB
Mlocked:           19788 kB
Shmem:           4481940 kB
Committed_AS:   44949856 kB

Note the huge amount of Committed_AS (~45 GB) in my system even though the total number of anonoymous pages, locked memory plus Shmem total to about 12 GB. As I'm running desktop environment on this system, I would assume that I have lots of processes that have executed fork() after acquiring/using lots of RAM. In this case the forked process can in theory modify all that memory without doing any explicit memory allocations and all this forked memory is counted upwards the Committed_AS value. As a result, the Committed_AS may not reflect your real system memory usage at all.

TL;DR: Committed_AS is estimated allocated virtual memory that is not backed up by any filesystem or max amount of memory that must be backed by real storage (RAM+swap) in theory to keep currently running processes still running if nothing allocates more memory in the whole system.

However, If the system is communicating with outside world, even incoming IP packets could cause more memory to be used so you cannot make any guarantees about future system behavior based on this number. Also note that stack memory is always allocated on the fly so even if none of your processes fork() or make explicit memory allocations, your memory usage (Committed_AS) may still increase when processes use more stack space.

In my experience Committed_AS is only really meaningful to compare to previous runs with similar workloads. However, if Committed_AS is less than your MemTotal you can be pretty sure that the system has very light memory pressure compared to your available hardware.

Is Committed_AS in /proc/meminfo really the correct number for allocated virtual memory in Linux? It is less than used/RSS

Update with more data and a comparison between systems

Update about a two processes having more RSS than Committed_AS

Solution 1:

Solution 2:

Related

Recent Posts