ZFS: Memory issues with dedup even though zdb -DD looks fine
Solution 1:
Dedupe on ZFS isn't always worth it. Okay, it's rarely worth it... I know it's appealing, sexy-sounding and seems to be a great selling point... but at what cost?
- Predictability.
- Stability.
- RAM usage.
- Planning and design.
- Performance.
Also see: ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?
So let's examine your DDT table...
If you're not sure how to compute, see: How large is my ZFS dedupe table at the moment?
DDT-sha256-zap-duplicate: 615271 entries, size 463 on disk, 149 in core
615271*149=91675379 -> 91675379/1024/1024 == 87.42 Megabytes.
So hmm... not much RAM required for the dataset.
Other items to note. You should probably be using lz4
compression, but that's about all I can see from here. Can you see if this is an interaction between the Linux virtual memory subsystems and ZFS? I'd keep ARC where it is... but check the Linux VM stats at the time of the slow speeds. This may depend a bit on what type of data you're storing. What types of files are these?
Solution 2:
A good rule of thumb is to plan around 5 GB of RAM for every 1 TB of disk. So if you have 2TB of data this would 10GB only for deduplication + ARC + ZFS metadata. It's not the answer you want, but it's not worth the effort. You still will have some savings with compression enabled. Take a look at this article
5GB is a general rule but it does not have to be true. We assume that you will need 5GB of RAM per 1TB assuming that you use 64K blocks. But the blocksize can be different beween 512b and 128K. The solution could be L2ARC and SSD drives but it will be expensive.
Solution 3:
Answering this myself for now - apparently, 0.6.2.1 still has lots of memory fragmentation overhead, the deduplication part of which will be improved in 0.6.3. I guess I'm going to try the current dev version or the patches suggested in the issue I opened: https://github.com/zfsonlinux/zfs/issues/2083. Let's see how that goes.
Update: see below - I decided to go with 0.6.2 and no deduplication for now. I will keep testing new releases until I fell "safe" with deduplication, as I believe it can make sense for my application.
Thanks everyone!
Solution 4:
You might be running into an implementation-specific issue. For Linux, there is the ZFS on Linux project as well as the zfs-fuse implementation. The latter is considerably slower, but you should try your scenario with both of them to rule out version-specific code issues. Also, it might be worth testing with a Nexenta / OpenIndiana release or even a Solaris 11.1 ODN install.
Keep in mind that ZFS' online deduplication has some architectural issues, huge memory consumption and rather high CPU utilization when writing to the pool being the main ones. It might be worth checking if offline deduplication like the one offered by Windows Server 2012 for NTFS or BTRFS with bedup patches would be a better fit your usage pattern.