Downsides of a small allocation unit size
Since I work in web development and bandwidth is always an issue, I make every effort to reduce the size of my files, especially images. As a result I have at least 10,000 files that are all around 200 bytes in size.
My hard drive's allocation unit size is 4 kB, which means that I am wasting a lot of space. When it comes to backing up in particular, I'd like to avoid wasting space.
What I'd like to know is what the downsides are to setting the smallest AU size. My main concern is what happens to 8 GB files if I have a 512-byte AU size?
The downsides of a small disk allocation unit include:
-
Larger allocation table.
This is the most obvious consequence of reducing the allocation unit. For a given sized volume, reducing the allocation unit from 4KB to 512 results in an allocation table 8 times larger. Note that the filesystem will likely have duplicate or triplicate copies of the allocation table for ensuring filesystem integrity.
-
Allocation occurs more often.
Since a smaller amount of disk space is allocated per unit, more filesystem overhead will be incurred when writing sequential files (the most common method). In order to allocate a cluster, the filesystem must acquire a mutex lock to ensure exclusive access, modify the allocation table, release the mutex, and then copy out the allocation table from memory to the disk.
-
Possible limit in volume size
Probably not an issue today with 32- and 64-bit processors. But back in the days of 8- and 16-bit processors and filesystems such as FAT, the number of allocation units combined with the allocation size actually imposed a real limit on the size of hard disk volumes/partitions. One obvious result was the FAT32 filesystem that increased the number of possible allocation units.
-
More file fragmentation
For a given file size, there will be obviously more allocation units assigned to that file. There are no guarantees that allocation units can/will be contiguous. For reading an 8MB file, the worst case scenario for 4KB clusters would involve 2048 seek plus rotational latency intervals, or one complete disk access for each cluster in sequence. The worst case scenario for 512B clusters would involve 16,364 seek plus rotational latency intervals! Obviously this (possible) fragmentation will impact data throughput.
As disk drives get larger, the allocation size is often increased to mitigate these downsides. The reasoning is that there is more disk space available to waste, but that is circular logic. Ideally the disk drive should have several partitions, with each partition formatted with an allocation sized for the "typical" file. For instance I leave the C: drive/partition with its default 4KB size. But the partition where TV recordings are written have 64KB clusters since the typical one-hour recording is about 6GB.