How to determine wasted space by large chunk size on macOS RAID array?

Working on a system that was created with a macOS RAID array with 256k chunk size for the members. The drive was originally to be used for video and image editing and storage, but has now become a multipurpose drive that has a lot of smaller files on it. How can I determine the amount of wasted space on the drive that might be caused by this large chunk size?

If it is too considerable I believe I'll move these files to another drive and recreate the array with smaller chunk size commensurate with the usage now.

The chunk size of your RAID array does not determine how much space on disk a single file uses. Therefore no space is actually wasted due to having a larger chunk size than optimal.

The amount of space wasted is instead determined by the file system block size, which is independent of the RAID array chunk size. On macOS, you're typically looking at APFS, which uses 4096 byte blocks - or HFS+ which uses 512 byte sectors that are typically grouped together in allocation blocks of 4096 bytes (unless you have a RAID drive that is more than 16 TB, then it is larger).

You can determine your allocation block size by running this command in the Terminal (change the device node to match your disk setup):

diskutil info /dev/disk2s1

Unfortunately lots of "myths" and wrong information has circulated regarding RAID chunk sizes, as it has been seen as a form of "dark arts" to choose the right size. It is essentially hard to choose the optimal chunk size from a long list of options without actually benchmarking with the actual data and operations done on them.

However, in your case you actually have the type of setup you want. If you have many small files, you actually want a big chunk size on your RAID. If you have fewer, large files, you want a small chunk size on your RAID.

Unfortunately some have heard the opposite advice here. That comes from the fact that if you have a single disk, you want the opposite - i.e. for storing few, large files you want big blocks, and for storing many, smaller files, you want small blocks. This is because you want to minimize the number of block operations per second with large files to optimize throughput, whereas for smaller files, you want to optimize for latency by having smaller blocks and thus more operations per second.

However, on a RAID-system with many disks - things are ofcourse different. When dealing with large files, you want to distribute the workload evenly over many drives to optimize performance. This means relatively small chunks so that you can get many drives working for you at once - each with their own small chunk. On the other hand, when you're dealing with small files, you want to ensure that most operations can be completed by a single drive only, so you get the lowest latency possible. This means a large chunk size to ensure that your data is contained in a single chunk that can be processed by a single disk.

How to determine wasted space by large chunk size on macOS RAID array?

Related

Recent Posts