Linux disk usage analyser that acts like symlinks are real files

Solution 1:

GNU du has the --dereference option, which dereferences symbolic links when computing disk usage. However, du refuses to count the same space twice, which may be a deal-breaker in your situation:

% mkdir foo bar baz
% dd if=/dev/zero of=foo/test bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.0176239 s, 581 MB/s
% (cd bar; ln -s ../foo/test)
% (cd baz; ln -s ../foo/test)
% du -hc bar baz
4.0K    bar
4.0K    baz
8.0K    total
% du -hc --dereference bar baz
9.8M    bar
4.0K    baz
9.8M    total

If you don't have multiple symlinks to the same target, though, I think --dereference does what you want.

Solution 2:

nowadays, git-annex has its own solutions for this problem. you can use:

git annex info --fast *

...to get actual disk usage (and more) from the files directly from git-annex. it can also operate on remote repositories, which is very useful:

git annex info --fast --not --in here .

... would give you the amount of data that is not in the current repository for example.

i have also used ncdu with this small patch with good results.

the upstream forum discussing this is "du" equivalent on an annex? and has more suggestions, like du -L, gadu and sizes that were mentionned in other answers here.

Solution 3:

git-annex has a list of related software including some git-annex aware disk usage tools - gadu and sizes.