Is there anything faster than `find . | wc -l` to count files in a directory?
Not uncommonly I have to count the number of files in a directory, sometimes this runs into the millions.
Is there a better way than just enumerating and counting them with find . | wc -l
?
Is there some kind of filesystem call you can make on ext3/4 that is less I/O intensive?
Not a fundamental speed-up but at least something :)
find . -printf \\n | wc -l
You really do not need to pass the list of file names, just the newlines suffice. This variant is about 15 % faster on my Ubuntu 12.04.3 when the directories are cached in RAM. In addition this variant will work correctly with file names containing newlines.
Interestingly this variant seems to be a little bit slower than the one above:
find . -printf x | wc -c
Special case - but really fast
If the directory is on its own file system you can simply count the inodes:
df -i .
If the number of directories and files in other directories than the counted one do not change much you can simply subtract this known number from the current df -i
result. This way you will be able to count the files and directories very quickly.
I have written ffcnt for exactly that purpose. It retrieves the physical offset of directories themselves with the fiemap
ioctl and then scheduling the directory traversal in multiple sequential passes to reduce random access. Whether you actually get a speedup compared to find | wc
depends on several factors:
- filesystem type: filesystems such as ext4 which support the
fiemap
ioctl will benefit most - random access speed: HDDs benefit far more than SSDs
- directory layout: the higher the number of nested directories, the more optimization potential
(re)mounting with relatime
or even nodiratime
may also improve speed (for all methods) when the accesses would otherwise cause metadata updates.
Actually, on my system (Arch Linux) this command
ls -A | wc -l
is faster than all of the above:
$ time find . | wc -l
1893
real 0m0.027s
user 0m0.004s
sys 0m0.004s
$ time find . -printf \\n | wc -l
1893
real 0m0.009s
user 0m0.000s
sys 0m0.008s
$ time find . -printf x | wc -c
1893
real 0m0.009s
user 0m0.000s
sys 0m0.008s
$ time ls -A | wc -l
1892
real 0m0.007s
user 0m0.000s
sys 0m0.004s