Placing many (10 million) files in one folder

Solution 1:

Would there be any problem with me saving the potential circa 10 million results in separate files in one directory?

Yes. There probably are more reasons but these I can post off the top of my head:

  • tune2fs has an option called dir_index that tends to be turned on by default (on Ubuntu it is) that lets you store roughly 100k files in a directory before you see a performance hit. That is not even close to the 10m files you are thinking about.

  • ext filesystems have a fixed maximum number of inodes. Every file and directory uses 1 inode. Use df -i for a view of your partitions and inodes free. When you run out of inodes you can not make new files or folders.

  • commands like rm and ls when using wildcards expand the command and will end up with a "argument list too long". You will have to use find to delete or list files. And find tends to be slow.

Or is it advisable to split them down into folders?

Yes. Most definitely. Basically you can not even store 10m files in 1 directory.

I would use the database. If you want to cache it for a website have a look at "solr" ("providing distributed indexing, replication and load-balanced querying").

Solution 2:

Ended up with same issue. Run my own benchmarks to find out if you can place everything in the same folder vs. having multiple folders. It appears you can and it's faster!

Benchmark

Ref: https://medium.com/@hartator/benchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28