Placing many (10 million) files in one folder
Solution 1:
Would there be any problem with me saving the potential circa 10 million results in separate files in one directory?
Yes. There probably are more reasons but these I can post off the top of my head:
tune2fs
has an option calleddir_index
that tends to be turned on by default (on Ubuntu it is) that lets you store roughly 100k files in a directory before you see a performance hit. That is not even close to the 10m files you are thinking about.ext
filesystems have a fixed maximum number of inodes. Every file and directory uses 1 inode. Usedf -i
for a view of your partitions and inodes free. When you run out of inodes you can not make new files or folders.commands like
rm
andls
when using wildcards expand the command and will end up with a "argument list too long". You will have to usefind
to delete or list files. Andfind
tends to be slow.
Or is it advisable to split them down into folders?
Yes. Most definitely. Basically you can not even store 10m files in 1 directory.
I would use the database. If you want to cache it for a website have a look at "solr" ("providing distributed indexing, replication and load-balanced querying").
Solution 2:
Ended up with same issue. Run my own benchmarks to find out if you can place everything in the same folder vs. having multiple folders. It appears you can and it's faster!
Ref: https://medium.com/@hartator/benchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28