Rsync size is difference from source to destination
I'm using rsync with the options
-r for recursive
-l copy symlinks as symlinks
-t preserve modification time
-D preserve devices and specials
-v verbose
--prune-empty-dirs
The source FS is ext4 and the destination is XFS. I've copied few hundred folders that range between few hundred gigs to few TB and they are all within within less than 1GB size difference. However This particular folder is 264GB on source and once I rsync it across it is 286GB. That is a huge difference and I don't know what is wrong with it.
If the source ext4 FS has some corruption, is it possible that it isn't reporting the correct disk usage? I'm using 'du -skh'.
I've deleted the whole thing and restarted it 3 times and it yields the same results.
The most likely cause is hard links. Rsync by default turns 2 hardlinked files into duplicate files on the target taking up twice the disk space. If you want to preserve hard links add the -H/--hard-links
option.
The next most likely issue is sparse files. Rsync by default does not write any files as sparse files even if they are on the source (it can't actually tell). If you have sparse files (most commonly used as virtual machine images and incomplete p2p downloads) then you will want to use the --sparse option
.
Ran into this "problem" when using 'du -b -d0 source destination'
as I had a huge list of things not match as I drilled down.
The problem seemed to be is that du insists on reporting the disk usage of directories and files, and I wanted only the size of files.
So, since creating a few directories will use more bytes on some filesystems, and less on others, you get a difference.
The solution is only to compare the sizes of actual files, and not directories.
The following command line uses find to output only files in the music directory, then uses du to total the byte count
find music -type f -print0 |du --files0-from=- -cb
if someone would post a sed script to do the same thing, please do
The rsync FAQ page lists these reasons: https://sanitarium.net/rsyncfaq/#differentsizes
However the only way to know is to compare the files.
For a small number of files you could do diff -r /mnt/data /mnt/data-BACKUP
. However if that stops mid-way, it can't be restarted from where it left off. Older diff programs don't handle binary files well.
For a large number of files, I recommend calculating the hashes of all the files and look for differences. This way if the process stops or breaks, you can continue without much difficulty.
See this script as an example:
https://github.com/TomOnTime/tomutils/blob/master/bin/md5tree
md5tree /mnt/data >/var/tmp/list.orig
md5tree /mnt/data-BACKUP >/var/tmp/list.backup
# NOTE: For these next 2 lines TAB means press the TAB key.
sort -t'TAB' -k6 </var/tmp/list.backup >/var/tmp/list.backup.sorted
sort -t'TAB' -k6 </var/tmp/list.orig >/var/tmp/list.orig.sorted
diff /var/tmp/list.orig.sorted /var/tmp/list.backup.sorted