Maximum number of files/directories on Linux?

ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i. For example, on a 15GB ext3 filesystem, created with the default settings:

Filesystem           Inodes  IUsed   IFree IUse% Mounted on
/dev/xvda           1933312 134815 1798497    7% /

There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.

As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls or rm) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.

For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'. You might store your files under /storage/root/d/e/12345. You've now cut the number of files in each directory by 1/256th.

If your server's filesystem has the dir_index feature turned on (see tune2fs(8) for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index has been the default for new filesystems for most of the distributions for several years now, so it would only be an old filesystem that doesn't have the feature on by default.)

That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls * working without over-running the kernel's maximum argv size.

Typically, this is done by something like:

/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...

i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum or sha1sum of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16 is easy enough mechanism to determine which directory to use.)

60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.

E.g., I have a project where the files have numbers. I group them in 1000s, so I have

id/1/1332
id/3/3256
id/12/12334
id/350/350934

You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.

In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):

If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBM indexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).

The year is 2014. I come back in time to add this answer. Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.

I hope this helps someone decide from all the alternatives.

Maximum number of files/directories on Linux?

Related

Recent Posts