ZFS interpret output of zdb -S tank
I wanted to know if it would pay it out for me to activate the zfs deduplication so I ran the command
zdb -S tank
but know I need some help to interpret the output.
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 49.2M 6.15T 6.15T 6.14T 49.2M 6.15T 6.15T 6.14T
2 352K 42.0G 42.0G 42.0G 725K 86.3G 86.3G 86.4G
4 7.99K 913M 913M 916M 37.7K 4.20G 4.20G 4.21G
8 1.43K 161M 161M 161M 14.6K 1.58G 1.58G 1.58G
16 623 67.1M 67.1M 67.4M 12.2K 1.32G 1.32G 1.33G
32 73 7.37M 7.37M 7.43M 2.65K 268M 268M 270M
64 717 4.23M 4.23M 7.46M 48.3K 392M 392M 611M
128 4 257K 257K 266K 689 40.9M 40.9M 42.6M
256 2 128K 128K 133K 802 57.8M 57.8M 59.3M
512 2 1K 1K 10.7K 1.37K 703K 703K 7.32M
4K 1 128K 128K 128K 7.31K 935M 935M 934M
16K 1 512B 512B 5.33K 20.0K 10.0M 10.0M 107M
64K 1 128K 128K 128K 93.0K 11.6G 11.6G 11.6G
512K 1 128K 128K 128K 712K 89.0G 89.0G 88.9G
Total 49.6M 6.19T 6.19T 6.18T 50.9M 6.34T 6.34T 6.33T
dedup = 1.02, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.03
Thanks in advance.
There are two things that you should look at this histogram. The first and most obvious one is the dedup
expression at the end of it. There's nothing much to say about it since it's simple mathematics. In your case deduplication will only provide a space saving of 2%, and since you don't use compression (which you should in first place, because it saves space and gives you performance because I/O is much more costly than CPU time with an efficient algorithm like LZ4), that's the marginal gain that you'll have after enabling deduplication: 2~3%.
Deduplication starts to be valuable when the effective space saving is higher than 2.0 and your storage subsystem is so expensive, that memory and CPU are OK to be wasted just to handle deduplication. We are talking about Enterprise NVMe pools for example.
But at which cost this come?
That's the second thing that I've mentioned. The first hit will be in your RAM. You'll need to store the deduplication tables on RAM. If there's no RAM to hold it, the system will just crash and you'll be unable to mount the pool. There's some advancements with newer versions of ZFS (Like OpenZFS 2.0), but I'm not aware if anything has changed regarding this.
With this in mind, you just get the total number of blocks, which is the last line of the first column: 49.6M
Since each dedup table needs 320 bytes you just multiply the number of blocks by the required space for a given dedup table and you'll get the needed amount of RAM:
49.6M * 320 bytes = 15.872MB ~ 15.5GB
So you'll waste almost 16GB of system RAM just to deduplicate your non-dedup-friendly data. That 16GB will be removed from vital parts of the system, like ARC, that simply speedup your pool.
So, no. Deduplication does not worth except if:
- You have extremely expensive storage subsystem
- Your data can be easily deduplicated