Fast way of knowing real directory size on linux/bsd
What is the fastest way of computing real directory size? I somehow catch myself needing that a lot.
Simply doing:
# du -hs /dir
Is too slow. Is there any service I could run which would periodically compute directory sizes and cache them for later reference? (something like a locate database)
Sadly I don't know of any, but it shouldn't be too hard to write one. Once a night, run
# du -a / > /var/lib/filesizes.txt.
Then you just need a small script to sum those up. Something like:
# perl -ne 'BEGIN { $total = 0 } if ($_ =~ m/(\d+)\s+\/var\/www\//) { $total+=$1;} END {print "$total\n";}' /var/lib/filesizes.txt
If you want something a little more in sync, then you're gonna have to start writing something that uses inotify, to find out when the filesystem changes and updates a database, which would probably be something like bdb.
I used to have a cron job that would redirect the output of 'du -htX --max-depth=3' (or something similar, its been a few years) to a text file. Then I had munin create an rrdtool graph using the file as input. It was hacky, but it gave me an at-a-glance idea of how much room our backups were using and the storage trends for a somewhat granular directory hierarchy.
If you've got a desktop environment on the box in question, FileLight is awesome. It's fairly quick and allows you to drill down a directory tree and then only rescan that sub-tree when you want to get an updated view. You could very well run a full scan once a day and then just leave the program open all day without ever doing an update.
if this is to use a nagios style check for a directories size, you could do something like the following
You can have this cron entry:
*/5 * * * * root du -s /path/to/some/dir > /tmp/directory_usage.tmp && /bin/mv /tmp/directory_usage.tmp /tmp/directory_usage
Then you can just use a script to get the contents of /tmp/directory_usage instantly.
Obviously there will be a race condition if the directory starts getting very large. (ie du -s starts to take near the 5 minute mark.)
Another route is to use find to build a list of file sizes in a directory and store that into either a flat file or a file db (if you plan on doing lots of directories simultaneously)
and one last way is to use find to get a list of files periodically that's modified time is more recent than the last run and essentially 'synchronise' file sizes to a db structure... obviously this all depends on what your trying to achieve by your question