Filesystem large number of files in a single directory

OK, not so large but I need to use something where around 60,000 files with average size of 30kb are stored in a single directory (this is a requirement so can't simply break into sub-directories with smaller number of files).

The files will be accessed randomly, but once created there will be no writes to the same filesystem. I'm currently using Ext3 but finding it very slow. Any suggestions?


Solution 1:

You should consider XFS. It supports a very large number of files both at the filesystem and at the directory level, and the performance remains relatively consistent even with a large number of entries due to the B+ tree data structures.

There's a page on their wiki to a large number of papers and publications that detail the design. I recommend you give it a try and benchmark it against your current solution.

Solution 2:

One billion files on Linux

The author of this article digs into some of the performance issues on file systems with large files counts and does some nice comparisons of the performance of various file systems ext3, ext4 and XFS. This is made available as a slide show. https://events.static.linuxfound.org/slides/2010/linuxcon2010_wheeler.pdf

time to run mkfstime to create 1M 50kb filesFile system repair timeremoving 1m files

Solution 3:

Many files in a directory on ext3 has been discussed in length at the sister site stackoverflow.com

In my opinion 60 000 files in one directory on ext3 is far from ideal but depending on your other requirements it might be good enough.

Solution 4:

OK. I did some preliminary testing using ReiserFS, XFS, JFS, Ext3 (dir_hash enabled) and Ext4dev (2.6.26 kernel). My first impression was that all were fast enough (on my beefy workstation) - it turns out that the remote production machine has a fairly slow processor.

I experienced some weirdness with ReiserFS even on initial testing so ruled that out. It seems that JFS has 33% less CPU requirement than all the others and so will test that out on the remote server. If it performs well enough, I'll use that.

Solution 5:

Using tune2fs to enable dir_index might help. To see if it is enabled:

sudo tune2fs -l /dev/sda1 | grep dir_index

If it is not enabled:

sudo umount /dev/sda1   
sudo tune2fs -O dir_index /dev/sad1
sudo e2fsck -D /dev/sda1
sudo mount /dev/sda1

But I have a feeling you might be going down the wrong path... why not generate a flat index and use some code to select randomly based on that. You can then use sub directories for a more optimized tree structure.