Why do two directory hierarchies that are in sync have different sizes?
I'm using rsync to sync two folders
rsync -arzv --times --delete-after --relative -e ssh user@host:path/./media/ ~/path/
and it says everything is good, but the destination reports:
$ du -s path/media/
18335196 site_media/media/
and the source reports:
$ du -s path/media/
18473500 site_media/media/
When I dig down into the problem, all the files are the same size, but the directories differ in size. Why? Both are VM's running ubuntu, the source is on 11.04 and the destination is on 12.04 LTS
I understand why they don't add up to the same numbers, what I'd like to understand is why the folders report different sizes.
Since it's two different VMs running different major versions of Ubuntu I'd suspect block size of the filesystem is the culprit. du
reports how much of the disk is being used, not the sum of the file sizes. A subtle, yet important distinction.
If you have a file that is 1 byte in size and your block size is 1KB then du
will report 1KB as used. If the block size is 4KB then it will report 4KB used. If that file is 1025B then it would report 2KB used for the 1KB block size and 4KB for the 4KB. And if the file is 4097B then it will be reported as 5KB on the 1KB block size and 8KB on the 4KB block size.
This sequence demonstrates this behavior:
$ touch foo ; du -h foo
0B foo
$ echo -n 1 > foo ; du -h foo
4.0K foo
Use this command to show the block size of your filesystems:
tune2fs -l /dev/sda1 | grep -i 'block size'
(Obviously, replace /dev/sda1
with the appropriate block device.)
If it's different, there's your discrepancy.
A better way to check for the exactness of the rsync
is to hash your files and compare. Here's an example:
find path/media -exec openssl sha1 {} + | sort > ~/hashes
Then diff
the hashes
files.
There are many sources of differences when using du
. Check man
for reference.
I have been facing such problem on aix too. In manual, there is an option --apparent-size
, which describes these differences quite well. Also - mind the block size for which the size is calculated by du
(default is 1024 bytes, but may vary depending on system). You will have to cope with it using a command which shows exact size of files (ls
or find
), which was the way, I've used to solve this.