File system "extents" and "clusters"

I'm trying to learn about HFS+ and it keeps referring to "extents."

Wikipedia

Fork Data Attribute records contain references to a maximum of eight extents that can hold larger attributes. Extension Attributes are used to extend a Fork Data Attribute record when its eight extent records are already used.

  1. What is an extent and how is it used?
  2. Do file systems that use extents also use clusters?
  3. How are extents and clusters different?

I have read the Wikipedia entry for extents and all it says is: They are contiguous blocks of reserved memory. Without context this has no meaning.


Solution 1:

(Disclaimer: I know about filesystems in general, but not HFS specifically.)

A cluster is a group of disk sectors that are allocated as a unit. It's generally a small power of two. For example, if a filesystem allocates space in units of 4 kilobytes, but the disk's physical sector size is 512 bytes, a cluster will correspond to a group of 8 sectors. Clusters are also referred to as "blocks" or "allocation units".

In a nutshell, a cluster is the smallest unit of storage in a filesystem, in the same sense that a sector is the smallest unit of storage on the underlying disk. They might be the same (e.g. a filesystem using 4k clusters on a disk with 4k physical sectors) or they might be different (you can make a FAT filesystem with 64k clusters, but no disk has sectors that big).

An extent means a contiguous range of clusters somewhere on the disk, described by a starting cluster number and a length (how many clusters after the starting one). They're used to keep track of where a file's contents are located on the disk. Ideally, a file's entire contents should be stored in one contiguous region so it can be described by a single extent record, but if the file is fragmented, each portion is described by a separate extent record.

Solution 2:

I think Apple tries to hide it, but the full technical description of the HFS+ volume format can be found on their developer website here:

Technical Note TN1150
HFS Plus Volume Format

Here are some bits that are relevant to your question:

HFS Plus allocates space in units called allocation blocks; an allocation block is simply a group of consecutive bytes. The size (in bytes) of an allocation block is a power of two, greater than or equal to 512, which is set when the volume is initialized. This value cannot be easily changed without reinitializing the volume. Allocation blocks are identified by a 32-bit allocation block number, so there can be at most 2^32 allocation blocks on a volume. Current implementations of the file system are optimized for 4K allocation blocks. Note: For the best performance, the allocation block size should be a multiple of the sector size. If the volume has an HFS wrapper, the wrapper's allocation block size and allocation block start should also be multiples of the sector size to allow the best performance.

So basically, what Microsoft calls "clusters" in FAT and NTFS, Apple calls "allocation blocks" in HFS+. This answers your second question: Yes, HFS+ is an example of a filesystem that uses both extents and clusters allocation blocks. And for that matter, NTFS also uses extents and clusters.

HFS+ tracks which allocation blocks belong to a fork by maintaining a list of the fork's extents. An extent is a contiguous range of allocation blocks allocated to some fork, represented by a pair of numbers: the first allocation block number and the number of allocation blocks. For a user file, the first eight extents of each fork are stored in the volume's catalog file. Any additional extents are stored in the extents overflow file, which is also organized as a B-tree.

So in HFS+, an extent is a contiguous run of allocation blocks used to store a file* or a portion of a file. If the file is fragmented, it uses one extent per fragment. From what I can tell, this matches the way discussions of NTFS internals use the term "extents" as well.

*file: Technically, I should have said "fork" here, but since no one uses resource forks anymore, the fact that HFS+ supports separate "data" and "resource" forks for each file is mostly just a historical anachronism.