ZFS Heavy Write Amplification due to Free Space Fragmentation

Without deep debug, it is difficult to give you a definitive answer. Anyway, some things to note are:

ZFS allocate blocks via spacemaps. When a spacemap is >= 96% full (80% for older build), ZFS will switch from first-fit to best-fit allocator. Note that this is a per-spacemap decision: you can have an 80% full pool with some spacemaps well over that value, maybe already at over 96%. When writing to such spacemaps, ZFS will use the slower best-fit allocator
a fragmented spacemap will use much more memory than a non-fragmented one. This added memory pressure can lead to spacemap trashing. You can avoid that by setting metaslab_debug_load=1; if it does not work, try re-importing your pool and/or setting metaslab_debug_unload=1. Note that persistently locking all spacemaps in memory will inevitably consume more RAM
you could be burned by gang blocks but, again, it is difficult to tell if it is the case without further debug. Surely a 128K recordsize, with such a good compressratio, is doing you no favor with regard to fragmentation. You can read some more information here and here.

Side note: I see your pool has ashift=9. I think that pure 512B devices are quite rare nowadays, especially in cloud environment. In a bid to increase performance, you can/would re-create your pool with ashift=12.

ZFS Heavy Write Amplification due to Free Space Fragmentation

Related

Recent Posts