Why can't my Linux kernel reclaim its slab memory?

I have a system that suffered from ever-increasing memory usage until it hit the point where it was hitting swap even for mundane things and consequently becoming pretty unresponsive. The culprit appears to have been kernel-allocated memory, but I'm having difficulty figuring out what exactly was going on in the kernel.

How can I tell which kernel threads/modules/whatever are responsible for particular chunks of kernel memory usage?

Here's a graph of the system's memory usage over time:

System memory usage; slab_unrecl grows over time

The slab_unrecl value, which grows over time, corresponds to the SUnreclaim field in /proc/meminfo.

When I ran slabtop towards the end of that graph and sorted it by cache size, here's what it showed me:

 Active / Total Objects (% used)    : 15451251 / 15530002 (99.5%)
 Active / Total Slabs (% used)      : 399651 / 399651 (100.0%)
 Active / Total Caches (% used)     : 85 / 113 (75.2%)
 Active / Total Size (% used)       : 2394126.21K / 2416458.60K (99.1%)
 Minimum / Average / Maximum Object : 0.01K / 0.16K / 18.62K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
3646503 3646503 100%    0.38K 173643       21   1389144K kmem_cache
3852288 3851906  99%    0.06K  60192       64    240768K kmalloc-64
3646656 3646656 100%    0.06K  56979       64    227916K kmem_cache_node
1441760 1441675  99%    0.12K  45055       32    180220K kmalloc-128
499136 494535  99%    0.25K  15598       32    124784K kmalloc-256
1066842 1066632  99%    0.09K  25401       42    101604K kmalloc-96
101430 101192  99%    0.19K   4830       21     19320K kmalloc-192
 19168  17621  91%    1.00K    599       32     19168K kmalloc-1024
  8386   7470  89%    2.00K    525       16     16800K kmalloc-2048
 15000   9815  65%    1.05K    500       30     16000K ext4_inode_cache
 66024  45955  69%    0.19K   3144       21     12576K dentry
369536 369536 100%    0.03K   2887      128     11548K kmalloc-32
 18441  16586  89%    0.58K    683       27     10928K inode_cache
 44331  42665  96%    0.19K   2111       21      8444K cred_jar
 12208   7529  61%    0.57K    436       28      6976K radix_tree_node
   627    580  92%    9.12K    209        3      6688K task_struct
  6720   6328  94%    0.65K    280       24      4480K proc_inode_cache
 36006  36006 100%    0.12K   1059       34      4236K kernfs_node_cache
266752 266752 100%    0.02K   1042      256      4168K kmalloc-16
134640 133960  99%    0.02K    792      170      3168K fsnotify_mark_connector
  1568   1461  93%    2.00K     98       16      3136K mm_struct
  1245   1165  93%    2.06K     83       15      2656K sighand_cache

Conclusions:

  • The kernel's slab allocator is using about 2.3 GB of RAM
  • Almost all of that is unreclaimable
  • About 1.3 GB of it is occupied by the kmem_cache cache
  • Another 0.5 GB belongs to the various-sized kmalloc caches

This is where I've hit a wall. I haven't figured out how to look inside those caches and see why they've gotten so large (or why their memory is unreclaimable). How can I go further in my investigations?


Solution 1:

perf kmem record --slab will capture profiling data and perf kmem stat --slab --caller will subtotal by kernel symbol.

That doesn't explain why your workload does this however. Add in perf record and look at the report to see what is calling into the kernel.

kprobes can trace specific kernel stacks leading to a type of allocation. I'm not super familiar with this myself, but try reading the examples accompanying eBPF scripts like slabratetop.

Also vary things up a bit on your host. Add RAM to be sure you are not under sizing it. Try newer kernel versions or a different distribution.