Memory used but not reported on Linux
I have 2 servers running the same applications, but one is miss-reporting memory by around 400mb (out of 1GB) and I cannot figure out where that memory is being used. I've compared the output of /proc/meminfo
and top
between both servers and they report very similar memory usage (process wise), but one one has around 400 less memory available.
As a comparison, this is the output of free on this server and the other, where it shows a ~450mb of ram difference.
# Server with missing memory
total used free shared buff/cache available
Mem: 957 707 85 0 164 107
Swap: 0 0 0
# Other server, looking good
total used free shared buff/cache available
Mem: 953 224 210 0 518 553
Swap: 0 0 0
Here's the output of /proc/meminfo
$ cat /proc/meminfo
MemTotal: 980756 kB
MemFree: 89020 kB
MemAvailable: 108824 kB
Buffers: 32440 kB
Cached: 96944 kB
SwapCached: 0 kB
Active: 196416 kB
Inactive: 61448 kB
Active(anon): 128888 kB
Inactive(anon): 124 kB
Active(file): 67528 kB
Inactive(file): 61324 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 656 kB
Writeback: 0 kB
AnonPages: 128488 kB
Mapped: 57032 kB
Shmem: 524 kB
Slab: 78228 kB
SReclaimable: 37016 kB
SUnreclaim: 41212 kB
KernelStack: 2896 kB
PageTables: 7956 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 490376 kB
Committed_AS: 1099792 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 444392 kB
DirectMap2M: 575488 kB
DirectMap1G: 0 kB
And the output of top (sorted by memory usage)
top - 09:35:16 up 706 days, 11:49, 1 user, load average: 0.15, 0.03, 0.01
Tasks: 124 total, 1 running, 82 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 980756 total, 87896 free, 726108 used, 166752 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 108060 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6348 ubuntu 20 0 582484 35120 13156 S 0.0 4.1 0:07.42 node
14394 root 20 0 439088 19760 2692 S 0.3 3.0 695:53.01 python
2557 root 19 -1 124824 19424 7956 S 0.0 2.0 75:05.85 systemd-journal
24174 ubuntu 20 0 25932 7904 3384 S 0.0 0.8 0:00.08 bash
724 root 20 0 170944 7844 12 S 0.0 0.8 0:00.15 networkd-dispat
24089 root 20 0 107988 6820 5820 S 0.0 0.7 0:00.01 sshd
14402 root 20 0 35268 5960 1952 S 0.0 0.6 7:38.52 python
14404 root 20 0 34748 5520 2060 S 0.0 0.6 6:34.19 python
14403 root 20 0 33512 5000 1684 S 0.0 0.5 0:57.69 python
24173 ubuntu 20 0 108344 4672 3548 S 0.3 0.5 0:00.04 sshd
14407 root 20 0 33508 4408 1104 S 0.0 0.4 0:35.97 python
14408 nobody 20 0 35616 4388 1000 S 0.0 0.4 1:54.36 python
1 root 20 0 225416 4348 1876 S 0.0 0.4 3:45.45 systemd
704 root 20 0 71304 4032 2500 S 0.0 0.4 0:56.31 systemd-logind
24594 ubuntu 20 0 42240 3656 2984 R 0.0 0.4 0:00.02 top
And the output of mount | grep tmp
to show the any memory used by these filesystems. The main filesystem is ext4 (so no XFS)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=478348k,nr_inodes=119587,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=98076k,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=98072k,mode=700,uid=1000,gid=1000)
The server has been running for almost 2 years and it's due a restart (which I think will make the issue go away) but I would like to figure out the missing ~400mb are.
edit
Here's the reduced output of systemd-cgtop -m
from both servers. The healthy one looks like is using more memory, but this is not the case!
# unhealthy server
Control Group Tasks %CPU Memory Input/s Output/s
/ 181 2.0 250.1M - -
/system.slice 91 1.3 188.3M - -
/user.slice 26 0.7 104.1M - -
/system.slice/logging-agent.service 11 1.1 40.5M - -
/system.slice/mycustom.service 7 - 34.6M - -
# healthy server
Control Group Tasks %CPU Memory Input/s Output/s
/ 187 2.6 655.7M - -
/system.slice 95 1.4 446.6M - -
/user.slice 24 1.1 287.5M - -
/system.slice/cron.service 1 - 238.3M - -
/system.slice/mycustom.service 7 - 43.7M - -
/system.slice/logging-agent.service 11 1.3 40.0M - -
Less than 1 GB RAM for a VM is not a lot these days. MemAvailable
is 11% of MemTotal
, which isn't a large margin. A mere 60 MB more to get to a round 1024 MB would reduce pressure a bit. I base this off of Comitted_AS
.
But that doesn't tell where precisely the memory allocations are made. And top
isn't as helpful as you might think, RES will not add up neatly due to shared pages and other unintuitive things. A tip for top: hit c
to toggle a full command line. A huge number of programs use python or node, you'll want to tell them apart.
On Linux, use cgroups to measure. On systemd systems, enable memory accounting and look at systemd-cgtop -m
. Should be apparent whether user or service slices are allocating more memory. Either is plausible. And a few hundred MB difference is not a lot in absolute numbers, a couple extra worker jobs or a stray user login could add up to that.
The server has been running for almost 2 years and it's due a restart
Overdue. Guaranteed there are security updates that have not taken effect. Record /proc/meminfo
and systemd-cgtop
metrics, update, and reboot.