Options for performance improvements on very big Filesystems and high IOWAIT

I have a similar (albeit smaller) setup, with 12x 2TB disks in a RAID6 array, used for the very same purpose (rsnapshot backup server).

First, it is perfectly normal for du -hs to take so much time on such a large, and used, filesystem. Moreover du accounts for hardlinks, which cause considerable and bursty CPU load in addition to the obvious IO load.

Your slowness is due to the filesystem metadata being located in very distant (in LBA terms) blocks, causing many seeks. As a normal 7.2K RPM disk provides about ~100 IOPS, you can see how hours, if not days, are needed to load all metadata.

Something you can try to (non-destructively) ameliorate the situation:

  • be sure to not having mlocate/slocate indexing your /backup-root/ (you can use the prunefs facility to avoid that), or metadata cache trashing will severly impair your backup time;
  • for the same reason, avoid running du on /backup-root/. If needed, run du only on the specific subfolder interested;
  • lower vfs_cache_pressure from the default value (100) to a more conservative one (10 or 20). This will instruct the kernel to prefer metadata caching, rather than data caching; this should, in turn, speed up the rsnapshot/rsync discovery phase;
  • you can try adding a writethrough metadata caching device, for example via lvmcache or bcache. This metadata device should obviously be an SSD;
  • increase your available RAM.
  • as you are using ext4, be aware of inode allocation issues (read here for an example). This is not directly correlated to performance, but it is an important factor when having so many files on an ext-based filesystem.

Other things you can try - but these are destructive operations:

  • use XFS with both -ftype and -finobt option set;
  • use ZFS on Linux (ZoL) with compressed ARC and primarycache=metadata setting (and, maybe, an L2ARC for read-only cache).

This Filesystem stores a huge amount of small files with very many SEEK operations but low IO throughput.

🎉

This is thing that catches lots of people nowadays. Alas, conventional FSes do not scale any well here. I can give you probably just a few advices when it comes to the set-up you already have: EXT4 over RAID-6 on HDDs:

  1. Lower vm.vfs_cache_pressure down, say to 1. It'd change cacheing bias towards preserving more metadata (inode, dentry) instead of data itself and it should have positive effect in reducing number of seeks
  2. Add more RAM. Although it might look strange for a server that doesn't run any piggy apps, remember: the only way to reduce seeks is to keep more metadata in faster storage, given that you have 16 GB only it seems that it should be relatively easy to increase the RAM amount
  3. As I've said EXT4 isn't good choice for the use case you have, but still you can put in use some of the features it poses to soothe pain:
    • external journal is supported so you can try adding SSD (better mirrored) and place the journal there. Check out "ext4: external journal caveats"
    • Try switching journal mode to "all data's being journaled" mounting with data=journal
  4. Try moving files outside of single FS scope. For e. g., if you have LVM-2 here you can create volumes of a lesser size and use them for a time being, then when it gets full, create another one and so on.
    • If you don't have LVM-2 you can try doing that with /dev/loop but it's not that convenient and probably less performant

UPD.: since it's turned out to be Linux Software RAID (LSR) RAID-6, here goes additional item:

  1. LSR has own tuning options that many people seem to overlook
    • Stripe cache, that can be set thus to maximum: echo 32768 | sudo tee /sys/devices/virtual/block/md*/md/stripe_cache_size — But do this with care (use lesser value if needed) since the size is chunk-size multiple and depending on the chunk size you've chosen it would take different amount of RAM
    • External journal which can be also on those mirrored SSDs (but currently MD device created w/o journal can't be converted to use one).

— That's probably most of what can be improved w/o from scratch re-design.

I have a very poor performance since the file system (60TB net) exceeded 50% usage. At the moment, the usage is at 75%

That's very serious issue because that high disk space occupancy level only worsen fragmentation. And more fragmentation means more seeks. Wonder no longer why it gave more-or-less acceptable performance before reaching 50 %. Lots of manuals have clear recommendations to do not allow FSes grow up behind 75—80 %.