lvmcache/dm-cache writeback cache full performance
I have a SSD writeback cache in front of a HDD, set up through lvmcache (so a dm-cache). When the cache LV is not full (Data%
column in lvs
< 100.00%), writes go to the cache device (monitored via dstat
). However, when the cache LV is full (Data%
= 100.00%), writes go directly to the HDD, essentially becoming a writethrough cache. Blocks do not get evicted from the SSD cache, even after some time, and performance drops. When I try reading recently read data from the cached LV, reads are from the SSD, so I assume the entire SSD has now become a read cache. Is this expected behavior for dm-cache's write cache, even in writeback mode? Is there no reserved space for writes? This seems like quite a poor design as essentially users can only write one cache LV's worth of data before the cache becomes a writethrough cache.
My understanding is that dm-cache uses the mq eviction algorithm, but that only applies to read caching and thus is irrelevant to the write caching issue I am observing.
Is there a way to reserve space for a write cache, or use both a dm-writecache (which I understand will not do any read caching) and a dm-cache at the same time?
Solution 1:
dm-cache
is a "slow moving" cache: many read/write misses are required to promote a block, especially when promoting a new block means to demote an already-cached one.
The fixed block-based nature of dm-cache
, coupled with no reserved write-only area, means that many writes to the same non-cached blocks are required to trigger a block promotion/replacement. However, this also implies that the kernel pagecache is not "absorbing" these multiple missing writes, merging them in a single write to the underlying block devices.
In other words, you are probably seeing the combined effect of the kernel pagecache (which absorbs and merges writes) and the reluctance of dm-cache
to promote first-missed blocks.
If you want to reserve some device/space to write cache only, you can tap into dm-writecache
(and the usual lvmcache
)
Additional information:
dm-cache
does block promotion/demotion tracking access hits/misses. At first, you have an empty cache with all I/O directed to the origin (slow) device. So when you issue a, say, 4K read it will access the underlying slow device, with dm-cache
tracking the miss. After some other misses to the same cache block (default 32K), then entire cache block is copied to the fast device. If you now writes to the cached blocks, your write will be cached. If, however, your write is for an uncached block, it goes straight to the origin (slow) device. After some other uncached writes, dm-cache
will finally allocate the entire cache block (remember, 32K by default) copying the original data to the cache device. At this point, new reads/writes can be served from the cache. Demotion is simple: when a new block must be promoted, the oldest block is discarded/flushed.
In other words, for write to be cached, the corresponding cache segment must be allocated and the backing data must be copied on the cache device (allocate-on-write). To limit bandwidth usage between origin and cache device, this copy is only done after many misses (ie: a single miss will not promote a block). Note that reading many times the same uncached block will not work, as the kernel pagecache will simply provide the cached block by itself.
dm-writecache
works differently, being more similar to a traditional RAID controller writeback cache. It caches all writes, ignoring reads. It can almost be considered a "write-only L2 pagecache", where dirty pages are "swapped" waiting for the slow device to catch up. To use it, you need to partition your fast devices between dm-cache
(which, at this point, must be run as writethrough
cache) and dm-writecache
, or to dedicate different devices to them. I never tried doing that via LVM, and I suspect that the tooling will prevent you to nest/stack two different cache modules. However, you can try it via direct dmsetup
commands.