best filesystem for millions of files [closed]
Which Linux filesystem/setup would you choose for the best speed in the following scenario:
a few million files ~3mb file size on average random access to files need to get a list of all the files frequently constant writing of new files constant reading of old files
What really counts is how you organize your files.
If you plan to have a single big directory with ~10M files any filesystem will suffer, albeit XFS and ZFS will manage even this worst case quite well.
The recommended approach is to organize your files in multiple, smaller directories, with reasonable file counts (~32K) to avoid different but related issues (ie: ls
was once very slow for big directories).
If this is not possible I would go with XFS or ZFS but only after having simulated the intended load on a test setup (note: even EXT4 will be fine performance-wise, but you can hit hard the inode limit).
Your work load is almost the worst possible for a general purpose file system. Millions of files, frequent enumeration, lots of reads and writes. Enormous metadata I/O. With large number of files, it rarely the bandwidth of transferring the file themselves that is the problem, rather the number of IOPS to query directory entries and inodes repeatedly.
Test this workload synthetically, while monitoring the application to be sure performs acceptably. On realistic production scale storage and IOPS levels. Be sure to match the folder structure, 300 files per directory is very different from 3,000,000 files per directory. Try a couple different file systems, for Linux XFS and EXT4.
Possibly you will need very fast SSD storage and lots of RAM to make this perform adequately.
Maybe you have a support contract with your OS vendor where you can have a performance specialist look at it.
If getting acceptable performance demands it, consider application changes. Consider storing and querying the file lists from a database other than the file system. Many databases might be able to return a few million results faster than a file system constrained by POSIX in general and Linux VFS in particular.
From what you describe XFS is a proper match. It was created to handle billions of files. You’ll have to think about right back-end storage for what you plan though.