Understanding the different directory sizes
When comparing
root@debian:~# du -s /backup/test1/
5605364 /backup/test1/
two directories with du
,
root@debian:/etc/init.d# du -s /data/test1/
5605360 /data/test1/
du
tells me that there is a small difference in the total size of each directory.
Diff, on the other hand, tells me that both directoies are identical:
root@debian:/etc/init.d# diff -r /data/test1/ /backup/test1/
What is the reason?
Solution 1:
du
reports the "disk usage" of the files, not the exact byte count each files contains. the man page of du
even says:
du - estimate file space usage
it is entirely possible that the same set of files use different amout of disk space. this is because of the intricacies of the file system. for more information about the disk usage on the filesystem read the following question and answers: https://superuser.com/questions/218395/about-file-size-and-disk-usage-in-ext3.
diff
compares the contents of the files. diff
does not care about how many bytes a file actually use on the file system. it only cares about the bytes in the files.
if you want du
to report the exact byte count for each file you can use --apparent-size
.
Solution 2:
Since you didn't gave the details there is room for speculation/guessing.
I would assume that /data and /backup are mounted on different partitions that are formatted using different block sizes. This would result in a slightly different measure of physically used disk space as one (or more) additional (or fewer) blocks are necessary.
You can check this quickly by executing df
and copying the /dev/...
part of the mount points /data
and /backup
and feeding them to file -s
e.g.
file -s /dev/sda1
Solution 3:
This is probably due to differences in the size of the individual directories within the two directory trees. Each directory is represented on disk by some disk blocks which store the names of the files in the directory. These blocks are sometimes called the "directory file". When you run "ls -l" or "ls -s", the size field for a directory is the size of this directory file.
When you create a file within a directory, the file's name is added to the directory file. If the directory file doesn't contain enough space, it might have to be enlarged by adding another disk block .When you delete a file from a directory, the file's name is removed from the directory file leaving some unused space. The unused space can be used to store another filename. But it isn't returned to the OS, so the directory file can grow in size but it never shrinks.
If you were to compare the directory trees side-by-side, you'd probably find that some of the matching directories were different sizes. That generally means that one directory used to have more files in it, or files with bigger names, or had a lot of file creation/deletion "churn" in the past.
When diff compares two directories, it only checks whether both directories have the same names in them. It doesn't check the unused space within the respective directory files, and it doesn't care if the two directory files are actually the same size.