Linux: how many disk I/O does it take to read a file? How to minimize it? [duplicate]

According to this paper on Facebook's Haystack:

"Because of how the NAS appliances manage directory metadata, placing thousands of files in a directory was extremely inefficient as the directory’s blockmap was too large to be cached effectively by the appliance. Consequently it was common to incur more than 10 disk operations to retrieve a single image. After reducing directory sizes to hundreds of images per directory, the resulting system would still generally incur 3 disk operations to fetch an image: one to read the directory metadata into memory, a second to load the inode into memory, and a third to read the file contents."

I had assumed the filesystem directory metadata & inode would always be cached in RAM by the OS and a file read would usually require just 1 disk IO.

Is this "multiple disk IO's to read a single file" problem outlined in that paper unique to NAS appliances, or does Linux have the same problem too?

I'm planning to run a Linux server for serving images. Any way I can minimize the number of disk IO - ideally making sure the OS caches all the directory & inode data in RAM and each file reads would only require no more than 1 disk IO?


Linux has the same "problem". Here is a paper a student of mine published two years ago, where the effect is shown on Linux. The multiple IOs can come from several sources:

  • Directory lookup on each directory level of the file path. It may be necessary to read the directory inode and one or more directory entry blocks
  • Inode of the file

In normal IO pattern, caching is really effective and inodes, directories, and data blocks are allocated in ways that reduce seeks. However, the normal lookup method, which is actually shared by all file systems, is bad for highly randomized traffic.

Here are a few ideas:

1) The filesystem-related caches help. A large cache will absorb most of the reads. However, if you want to put several disks in a machine, the Disk-to-RAM ratio limits how much is cached.

2) Don't use millions of small files. Aggregate them to larger files and store the filename and the offset within the file.

3) Place or cache the metadata on an SSD.

4) And of course use a filesystem that does not have a totally anarchic on-disk directory format. A readdir should not take more than linear time, and direct file access ideally just logarithmic time.

Keeping directories small (less than 1000 or so) should not help so much because you would need more directories with need to be cached to.


This depends on filesystem you plan to use. Before read file data system:

  • Read directory file.
  • Read inode of your's file
  • Read sectors of your's file

If folder contains huge number of files, this is big preassure on cache.