Linus Torvalds and the OS X Filesystem

Back in 2008, Linus Torvalds famously said in an interview that "OS X in some ways is actually worse than Windows to program for. Their file system is complete and utter crap, which is scary." I've looked for more details about why he feels this way about the OS X filesystem (HFS+ presumably) but I haven't been able to find anything.

Linus surely doesn't dislike the basic Unix filesystem model, and I doubt he hates HFS+ for being case-insensitive. And despite how provocatively his comment is phrased, I doubt that it's completely without merit. Since the comment was in the context of programming for OS X, I suspect his opinion may have been based on performance, robustness, the operating system interface, or something along those lines. Does anyone know what complaints 2008-era Linus might have had with 2008-era HFS+?


A transcript of the “Q&A” session in which Linus made the comment is available, but it seems he wasn't asked to elaborate. I'm not sure whether a more in-depth analysis of his opinion on HFS+ has been written down somewhere else.

For someone else's analysis of the matter, you can take a look at John Siracusa's Mac OS X reviews. In particular the one for Mac OS X Lion which has a section titled “What's wrong with HFS+.” I think the most salient bit is (emphasis mine):

Concurrency, metadata written in the correct byte order, sub-second date precision, support for massive volume sizes, and sparse file support are all common features of Unix file systems. Mac OS X, of course, is built on a Unix foundation. When HFS+ was ported from classic Mac OS to Mac OS X, it needed to be extended to support some minimum set of features that are expected from Unix file systems.

Some of those features were an easy fit, but others were very difficult to add to the file system without breaking backwards compatibility. One particularly scary example is the implementation of hard links on HFS+. To keep track of hard links, HFS+ creates a separate file for each hard link inside a hidden directory at the root level of the volume. Hidden directories are kind of creepy to begin with, but the real scare comes when you remember that Time Machine is implemented using hard links to avoid unnecessary data duplication.

The important point here is that Mac OS X is using a file system which wasn't even designed for a Unix system, it was designed for classic Mac OS and patched to implement the features of Mac OS X 10.0 while maintaining backwards compatibility. Apple has subsequently implemented the additional features that it now has in Mac OS X 10.7 (journaling, metadata, filesystem events ...) using the same patching approach rather than a “design from the ground up” approach. I'm not sure how to explain this non-technically, but you could say that all of these additional features are resting on a classic Mac OS foundation that was never designed to support them. This means the solution isn't as good as it could be. The example that Siracusa goes on to discuss is that the solution Apple had to use for hard links while working within the limitations of HFS+ is too sensitive to hardware failure, which is compounded by the fact that HFS+ was also never designed to concern itself with data integrity. Of course, maintaining compatibility with classic Mac OS was a desirable limitation in Mac OS X 10.0 but it really isn't anymore in Mac OS X 10.7.


Although I´m not an Operating System expert, and I´ve just started using OSX after coming from Windows, I consider myself a PowerUser in Windows, and fairly competent in Linux. Coming from that background, I´ve been surprised that in a fairly modern OS like OSX, the filesystem has quirks such as the way the names of files are "mungled".

I understand that Linus´ issues with HFS+ stem from the same point: from what I've found researching the issue, HFS+ stores the names of files using Unicode, but when a file uses "extended" or NON-ASCII characters (like á, é, í, ó, ú, ñ from Spanish or things like the ü in German), for which Unicode provides 2 ways of encoding the name, OSX silently "normalizes" the encoding at storage time... Not a real issue when the file has been created and consumed in OSX, but when you're sharing information with users of other OSs, the fact that the name of the file changes, makes for all sort of weird behaviours...

Case in point: I've been tracking my work "artifacts" (files, documents, etc) in Subversion for the last 8plus years. When moving to Mac, I got the SVN client for Mac, and after doing a Checkout of my relevant directories, I found that all files that have accents appear to be missing, and a new file with the same name appears as non-versioned. Digging into it, the issues is that the file IN the file-system is apple-encoded, while the data in the repository uses another (perfectly valid and legitimate) Unicode encoding...

This, I think, is a gross "mangling" of my data. Apple DOES understand both formats of the filename encoding (accessing a share in Windows, or using a USB stick from Windows shows the proper file names, etc) but at file creation time, it's decided "it knows better" and just renamed the files...

Again, not something most users will notice - until they make a copy of a file, or rename it, and put it back to where the original one was and end up with two files that are apparently the same!!!)


John Siracusa & Dan Benjamin discuss some disadvantages of HFS+ in Hypercritical #56.

They addresses data corruption in HFS+ and consider some of ZFS's features.