How to solve linux subdirectories number limit?

Solution 1:

That limit is per-directory, not for the whole filesystem, so you could work around it by further sub-dividing things. For instance instead of having all the user subdirectories in the same directory split them per the first two characters of the name so you have something like:

top_level_dir
|---aa
|   |---aardvark1
|   |---aardvark2
|---da
|   |---dan
|   |---david
|---do
    |---don

Even better would be to create some form of hash of the names and use that for the division. This way you'll get a better spread amongst the directories instead of, with the initial letters example, "da" being very full and "zz" completely empty. For instance if you take the CRC or MD5 the name and use the first 8 bits you'll get somethnig like:

top_level_dir
|---00
|   |---some_username
|   |---some_username
|---01
|   |---some_username
...
|---FF
|   |---some_username

This can be extended to further depths as needed, for instance like so if using the username not a hash value:

top_level_dir
|---a
|   |---a
|       |---aardvark1
|       |---aardvark2
|---d
    |---a
    |   |---dan
    |   |---david
    |---o
        |---don

This method is used in many places like squid's cache, to copy Ludwig's example, and the local caches of web browsers.

One important thing to note is that with ext2/3 you will start to hit performance issues before you get close to the 32,000 limit anyway, as directories are searched linearly. Moving to another filesystem (ext4 or reiser for instance) will remove this inefficiency (reiser searches directories with a binary-split algorimth so long directories are handled much more efficiently, ext4 may do too) as well as the fixed limit per directory.

Solution 2:

If you are bound to ext2/ext3 the only possibility I see is to partition your data. Find a criterion that splits your data into manageable chunks of similar size.

If it's only about the profile images I'd do:

  1. Use a hash (e.g SHA1) of the image
  2. Use the SHA1 as file and directory name

For example the SQUID cache does it this way:

f/4b/353ac7303854033

Top level directory is the first hex-digit, second level is the next two hex-digits, and the file name is the remaining hex-digits.

Solution 3:

Cant we have a better solution?

You do have a better solution - use a different filesystem, there are plenty available, many of which are optimised for different tasks. As you pointed out ReiserFS is optimised for handling lots of files in a directory.

See here for a comparison of filesystems.

Just be glad you're not stuck with NTFS which is truly abysmal for lots of files in a directory. I'd recommend JFS as a replacement if you don't fancy using the relatively new (but apparently stable) ext4 FS.

Solution 4:

Is the profile image small? What about putting it in the database with the rest of the profile data? This might not be the best option for you, but worth considering...

Here is a ( older ) Microsoft whitepaper on the topic: To BLOB or not to BLOB.