Directory backed up with rsync is much bigger than source
We backup a set of virtual machines to an external USB drive using rsync -a
. The source directory is 145G as reported by du -sh
, but the target is reporting 181G.
The two file systems are ext3
and the block size is the same, so can someone explain what the discrepancy is?
As Dennis mentioned, it seems to be a sparse file issue. An example of that can be:
$ dd if=/dev/zero of=sparse.txt count=0 seek=1000
0+0 records in
$ du sparse.txt
0 sparse.txt
$ ls -l sparse.txt
-rw-r--r-- 1 user user 512000 2010-03-22 11:54 sparse.txt
As you can see du
reports how many blocks are actually used, while ls
shows how big the file is supposed to be.
Others already told about sparse files, but there is another thing: hard links. Hard links – multiple names for the file (and space on the disk) are often used on system partitions (e.g. for multiple shell commands implemented in the same binary) and they are not handled specially by rsync with the '-a' option only. So, e.g. a file with four hard links will be stored as four separate files.
Try using rsync -aH
.
-S, --sparse Try to handle sparse files efficiently so they take up less space on the destination. Conflicts with --inplace because it's not possible to overwrite data in a sparse fashion. NOTE: Don't use this option when the destination is a Solaris “tmpfs” filesystem. It doesn't seem to handle seeks over null regions correctly and ends up corrupting the files.
With the rsync script, are you deleting the items that exist on the destination and are no longer on the target? If you're looking to have an identical copy on both sides, you would need a "--delete" flag in your rsync routine.
"rsync -a --delete /source/ /destination/"
You can also inject "-n and -P" into the string to provide a dry-run and progress indicator, respectively, to show you what would happen with the "--delete" option.