Very high cache usage causing slowdown
I'm trying to identify the culprit of what's causing my personal computer to be extremely sluggish. The biggest suspect is memory. When the computer is running fast my cache memory looks normal. However when it's running slow it looks like this:
luke@Luke-XPS-13:~$ free -m
total used free shared buff/cache available
Mem: 7830 1111 1090 277 5628 1257
Swap: 16077 665 15412
and this:
luke@Luke-XPS-13:~$ vmstat -S M
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 665 1065 67 5562 0 0 34 88 43 23 13 4 82 0 0
Caches are taking up 5.5GB of my 8GB memory, when all programs are closed, and after running
echo "echo 3 > /proc/sys/vm/drop_caches"
which should be force clearing them. As soon as the computer starts dipping into the swap its game over for usable speed. Shutdown temporarily fixes the problem but it eventually comes back and I can't figure out what's causing it. Slabtop reveals slightly more about the culprit, but I'm not sure what it implies. Why kmalloc-4096
?
Active / Total Objects (% used) : 1554043 / 1607539 (96.7%)
Active / Total Slabs (% used) : 167569 / 167569 (100.0%)
Active / Total Caches (% used) : 76 / 109 (69.7%)
Active / Total Size (% used) : 5091450.96K / 5105920.97K (99.7%)
Minimum / Average / Maximum Object : 0.01K / 3.18K / 18.50K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1254755 1254755 100% 4.00K 156847 8 5019104K kmalloc-4096
5430 5430 100% 2.05K 362 15 11584K idr_layer_cache
20216 9010 44% 0.57K 722 28 11552K radix_tree_node
8820 7358 83% 1.05K 294 30 9408K ext4_inode_cache
38577 25253 65% 0.19K 1837 21 7348K dentry
12404 11432 92% 0.55K 443 28 7088K inode_cache
30120 29283 97% 0.20K 1506 20 6024K vm_area_struct
31722 31722 100% 0.12K 933 34 3732K kernfs_node_cache
13696 12514 91% 0.25K 856 16 3424K kmalloc-256
27144 27134 99% 0.10K 696 39 2784K buffer_head
41088 29789 72% 0.06K 642 64 2568K kmalloc-64
632 567 89% 3.75K 79 8 2528K task_struct
2432 2274 93% 1.00K 152 16 2432K kmalloc-1024
3048 2677 87% 0.64K 127 24 2032K shmem_inode_cache
912 845 92% 2.00K 57 16 1824K kmalloc-2048
172 162 94% 8.00K 43 4 1376K kmalloc-8192
1736 1561 89% 0.56K 62 28 992K ecryptfs_key_record_cache
5103 4073 79% 0.19K 243 21 972K kmalloc-192
1792 1626 90% 0.50K 112 16 896K kmalloc-512
1456 1456 100% 0.61K 56 26 896K proc_inode_cache
10149 8879 87% 0.08K 199 51 796K anon_vma
24960 19410 77% 0.03K 195 128 780K kmalloc-32
360 352 97% 2.06K 24 15 768K sighand_cache
Based on your comments, you say cache usage doesn't noticeably drop when you try to echo 3 > /proc/sys/vm/drop_caches
This can only happen if this is a cache for writing. If you write 5 GB to some files, the data immediately lands in cache and your program continues. The cache is actually written to storage in the background as fast as possible. In your case the storage seems dramatically slow and you accumulate the unwritten cache until it drains all of your RAM and starts pushing everything out to swap.
Kernel will never write cache to swap partition. It keeps it in RAM until it is safely written to destination.
Kernel will never drop unwritten cache, because it would be a data loss (you've saved a file, so you expect the data to land on the permanent storage).
You can only solve it by speeding up the storage. This issue is often seen on storage mounted via network (check your mount
for types cifs
, nfs
, sshfs
, etc.) or slow USB1 devices.
You could also make issue much less dramatic to the system by capping the dirty cache with sysctl vm.dirty_ratio=10
before it grows too much.
dirty_ratio
Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.
The total available memory is not equal to total system memory.
If that's a correct diagnosis, you will see that cache can be easily dropped (at least 90% of it) and that the process that writes these gigabytes becomes very slow. The rest of system will become more responsive.