What is the name for sorting a directory with lots of files into subdirectories? [closed]
One way to make manageable a directory with a huge number of files into it is to sort the files into subdirectories named after progressive characters in the files' names.
E.g.:
- a8debcdcf0d2302ccde5a43bb1fb385e81098342.jpg - 91ff48de8cfc6468bdc2115cf87cfb6547eee713.jpg - 99d002e2065cdf02bd6d04bf29a8230564719b76.jpg ...
The above files get sorted into subdirectories similarly to this:
- a/ - 8/ - a8debcdcf0d2302ccde5a43bb1fb385e81098342.jpg - 9/ - 1/ - 91ff48de8cfc6468bdc2115cf87cfb6547eee713.jpg - 9/ - 99d002e2065cdf02bd6d04bf29a8230564719b76.jpg
There are several variations on this method, such as using a different number of characters to name subdirectories or using a hash or other algorithm to determine the path to each file.
Is there a formal name for this method of organizing files?
I've always called it hash-chunking.
There are a couple of things to keep in mind with structures like this:
- Each directory creates an inode. If you chunk on each character, a single file could cause, say, 33 inodes to be created. You'll run out of inodes before you run out of space that way.
- If you chunk on groups (say, first n-characters, followed by second n-characters) keep your sets small enough you're not forcing inodes to extend, which will slow lookups.
- If your hash is significantly random, the 3rd and further order chunks are practically never going to have siblings, so you may be able to chunk like... 1234/5678/901234567890etc and keep your inodes small.
It seems it's just called "hashed directory structure", for example at http://michaelandrews.typepad.com/the_technical_times/2009/10/creating-a-hashed-directory-structure.html :
How can one store a large number of files while maintaining a high level of performance during access? One solution is file name hashing.
It's called a B-Tree (not related to binary tree).