Memory used but not reported on Linux

I have 2 servers running the same applications, but one is miss-reporting memory by around 400mb (out of 1GB) and I cannot figure out where that memory is being used. I've compared the output of /proc/meminfo and top between both servers and they report very similar memory usage (process wise), but one one has around 400 less memory available.

As a comparison, this is the output of free on this server and the other, where it shows a ~450mb of ram difference.

# Server with missing memory
              total        used        free      shared  buff/cache   available
Mem:            957         707          85           0         164         107
Swap:             0           0           0

# Other server, looking good
              total        used        free      shared  buff/cache   available
Mem:            953         224         210           0         518         553
Swap:             0           0           0

Here's the output of /proc/meminfo

$ cat /proc/meminfo
MemTotal:         980756 kB
MemFree:           89020 kB
MemAvailable:     108824 kB
Buffers:           32440 kB
Cached:            96944 kB
SwapCached:            0 kB
Active:           196416 kB
Inactive:          61448 kB
Active(anon):     128888 kB
Inactive(anon):      124 kB
Active(file):      67528 kB
Inactive(file):    61324 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               656 kB
Writeback:             0 kB
AnonPages:        128488 kB
Mapped:            57032 kB
Shmem:               524 kB
Slab:              78228 kB
SReclaimable:      37016 kB
SUnreclaim:        41212 kB
KernelStack:        2896 kB
PageTables:         7956 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      490376 kB
Committed_AS:    1099792 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      444392 kB
DirectMap2M:      575488 kB
DirectMap1G:           0 kB

And the output of top (sorted by memory usage)

top - 09:35:16 up 706 days, 11:49,  1 user,  load average: 0.15, 0.03, 0.01
Tasks: 124 total,   1 running,  82 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   980756 total,    87896 free,   726108 used,   166752 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   108060 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6348 ubuntu    20   0  582484  35120  13156 S   0.0  4.1   0:07.42 node
14394 root      20   0  439088  19760   2692 S   0.3  3.0 695:53.01 python
 2557 root      19  -1  124824  19424   7956 S   0.0  2.0  75:05.85 systemd-journal
24174 ubuntu    20   0   25932   7904   3384 S   0.0  0.8   0:00.08 bash
  724 root      20   0  170944   7844     12 S   0.0  0.8   0:00.15 networkd-dispat
24089 root      20   0  107988   6820   5820 S   0.0  0.7   0:00.01 sshd
14402 root      20   0   35268   5960   1952 S   0.0  0.6   7:38.52 python
14404 root      20   0   34748   5520   2060 S   0.0  0.6   6:34.19 python
14403 root      20   0   33512   5000   1684 S   0.0  0.5   0:57.69 python
24173 ubuntu    20   0  108344   4672   3548 S   0.3  0.5   0:00.04 sshd
14407 root      20   0   33508   4408   1104 S   0.0  0.4   0:35.97 python
14408 nobody    20   0   35616   4388   1000 S   0.0  0.4   1:54.36 python
    1 root      20   0  225416   4348   1876 S   0.0  0.4   3:45.45 systemd
  704 root      20   0   71304   4032   2500 S   0.0  0.4   0:56.31 systemd-logind
24594 ubuntu    20   0   42240   3656   2984 R   0.0  0.4   0:00.02 top

And the output of mount | grep tmp to show the any memory used by these filesystems. The main filesystem is ext4 (so no XFS)

udev on /dev type devtmpfs (rw,nosuid,relatime,size=478348k,nr_inodes=119587,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=98076k,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=98072k,mode=700,uid=1000,gid=1000)

The server has been running for almost 2 years and it's due a restart (which I think will make the issue go away) but I would like to figure out the missing ~400mb are.

edit

Here's the reduced output of systemd-cgtop -m from both servers. The healthy one looks like is using more memory, but this is not the case!

# unhealthy server
Control Group                        Tasks   %CPU   Memory  Input/s Output/s
/                                      181    2.0   250.1M        -        -
/system.slice                           91    1.3   188.3M        -        -
/user.slice                             26    0.7   104.1M        -        -
/system.slice/logging-agent.service     11    1.1    40.5M        -        -
/system.slice/mycustom.service           7      -    34.6M        -        -

# healthy server
Control Group                        Tasks   %CPU   Memory  Input/s Output/s
/                                      187    2.6   655.7M        -        -
/system.slice                           95    1.4   446.6M        -        -
/user.slice                             24    1.1   287.5M        -        -
/system.slice/cron.service               1      -   238.3M        -        -
/system.slice/mycustom.service           7      -    43.7M        -        -
/system.slice/logging-agent.service     11    1.3    40.0M        -        -

Less than 1 GB RAM for a VM is not a lot these days. MemAvailable is 11% of MemTotal, which isn't a large margin. A mere 60 MB more to get to a round 1024 MB would reduce pressure a bit. I base this off of Comitted_AS.

But that doesn't tell where precisely the memory allocations are made. And top isn't as helpful as you might think, RES will not add up neatly due to shared pages and other unintuitive things. A tip for top: hit c to toggle a full command line. A huge number of programs use python or node, you'll want to tell them apart.

On Linux, use cgroups to measure. On systemd systems, enable memory accounting and look at systemd-cgtop -m. Should be apparent whether user or service slices are allocating more memory. Either is plausible. And a few hundred MB difference is not a lot in absolute numbers, a couple extra worker jobs or a stray user login could add up to that.


The server has been running for almost 2 years and it's due a restart

Overdue. Guaranteed there are security updates that have not taken effect. Record /proc/meminfo and systemd-cgtop metrics, update, and reboot.