How to get the actual directory size (out of du)?

How do I get the actual directory size, using UNIX/Linux standard tools?

Alternative question: How do I get du to show me the actual directory size (not disk usage)?

Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.

I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).

Calculating the directory size using du seems to be unreliable.
For example, mkdir foo && du -b foo reports "4096 foo", 4096 bytes instead of 0 bytes. With very large directories, the directory size reported by du -hs can be off by 100 GB (!) and more (compressed file system).

So what (tool/option) has to be used to get the actual directory size?


Here is a script displaying a human readable directory size using Unix standard tools (POSIX).

#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
BEGIN {sum=0} # initialization for clarity and safety
function pp() {
  u="+Ki+Mi+Gi+Ti+Pi+Ei";
  split(u,unit,"+");
  v=sum;
  for(i=1;i<7;i++) {
    if(v<1024) break;
    v/=1024;
  }
  printf("%.3f %sB\n", v, unit[i]);
}
{sum+=$5}
END{pp()}'

eg:

$ ds ~        
72.891 GiB

Some versions of du support the argument --apparent-size to show apparent size instead of disk usage. So your command would be:

du -hs --apparent-size

From the man pages for du included with Ubuntu 12.04 LTS:

--apparent-size
      print apparent sizes,  rather  than  disk  usage;  although  the
      apparent  size is usually smaller, it may be larger due to holes
      in (`sparse') files, internal  fragmentation,  indirect  blocks,
      and the like

Assuming you have du from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:

find . -type f -print0 | du -scb --files0-from=- | tail -n 1

Add the -l option to du if there are some hardlinked files inside, and you want to count each hardlink separately (by default du counts multiple hardlinks only once).

The most important difference with plain du -sb is that recursive du also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find command is used to pass only regular files to du. Another difference is that symlinks are ignored (if they should be counted, the find command should be adjusted).

This command will also consume more memory than plain du -sb, because using the --files0-from=FILE makes du store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)

If you want to get a human-readable representation of the total size, just add the -h option (this works because du is invoked only once and calculates the total size itself, unlike some other suggested answers):

find . -type f -print0 | du -scbh --files0-from=- | tail -n 1

or (if you are worried that some effects of -b are then overridden by -h)

find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1