Large RAID stripe size wastes space with small files?

I'm planning a Maildir storage with RAID 0+1(or RAID 10, RAID 5) on XFS filesystem and XFS will be created with stripe unit and width that RAID will use.

I don't determine RAID stripe size yet, but the default value is 128KB in my RAID system. If I use 128KB stripe size for the Maildir storage, does it waste space for a file smaller than the stripe size?

I think the average size of files in Maildir is 10KB, so what stripe size is the best for this environment?


Solution 1:

No, you don't lose (nor gain) any space via adjusting RAID stripe size. Stripe size is about telling your RAID system how to portion data for I/O operations.

How you look on the stripe size depends on RAID level.

For parity-calculating RAIDs (4,5,6), a write smaller than a stripe size causes a read-modify-write cycle, because parity is always calculated for a whole stripe. So, if you have a 10k write on 128k stripe, it causes the raid to read 128k of data, modify pertinent sectors in-memory, re-calculate the parity, and write the whole stripe. Bad Thing (TM).

For RAID sans parity (0, 1, multiple-mirrors 1) the stripe (and the number of disks) determines how much work per I/O is done (statistically) by a single spindle (lies, damned lies and statistics). If you have large-sequential I/Os, then large stripe is good, because each disk in a RAID will get a nice, sequential chunk of data to read/write. If you do a lot of small file access, smaller stripe is better, because it's more probable, that two independent I/Os will go to two different spindles and will be handled in parallel.

So much for theory, in practice the best approach will be to try different sizes and test. If you allow a remote possibility that your system will be disk-bound, forgo the idea of RAID5/6 for the Maildir filesystems. Small-file wrties penalty of read-modify-write will kill your performance.

One more thing: read this thread on tuning XFS on RAID for high IOs (high being in neighbourhood for 1M IOPs).

Solution 2:

I'd be tempted to stick with the default XFS block size of 4KB as it uses delayed allocation, so you'll see some very real efficiencies by using a smaller size. I certainly wouldn't worry about trying to match maildir file size to blocks anyway, you'll go insane, just let the FS do its job.