Linux filesystem or CDN for millions of files with replication

Millions of files in one directory is bad design and will be slow. Subdivide them into directories with smaller number of entries.

Take a look at https://unix.stackexchange.com/questions/3733/number-of-files-per-directory

Use RAID and /or SSDs. This will not in itself solve the slow access times, but if you introduce multiple directories and reduce the number of files per directory, say by an order of magnitude or two, it will help to prevent hotspots.

Consider XFS, especially when using multiple drives and multiple directories, it may give you nice gains (see e.g. this thread for options to use. It give some tips for XFS on md RAID).


Personally I would:

  1. Stick with your current FS. Split them into directories like you suggested, if you want you can still present it as a single directory, e.g. with mod_rewrite (guessing this is a CDN type application)
  2. Log changes that will need replicating, e.g. daily/hourly etc. such that every time you need to sync working out what files need to be copied can be as simple as running diff on the logs (i.e. you always sync the logs and sync them first but do a diff before replacing them to compute what else needs copying).