rsync: transferred file twice as large as original file
I have transferred a file from a client to a remote host with rsync
through a ssh
bridge, and now the resulting file on the remote host is more than twice as large as the original file on the client.
9.1GB -> 20GB
I have checked the size with du -sh
. This looks wrong. Given that these files sit in two different locations, and they are quite big, how can I check what went wrong?
client and remote host are two distinct flavours of Linux
output of ls -ls <file>
in the two locations is:
client
9528947 -rw-r--r-- 1 user1 group1 20420948104 Nov 2 13:45 filename.hdf5
remote host
19942340 -rw-r--r--. 1 user2 group2 20420948104 Nov 2 14:45 filename.hdf5
EDIT
It looks like it may be Thin provisioning
https://fedoramagazine.org/copying-large-files-with-rsync-and-some-misconceptions/
EDIT 2
running
rsync -avz -S <origin> <dest>
does not solve the problem.
EDIT 3
The filesystem on the remote host is:
df -Th
-> nfs4
The most common cause for such a problem is that the source file is a sparse file:
a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space. The full block size is written to disk as the actual size only when the block contains "real" (non-empty) data.
You need the parameter -S
or –sparse
to tell rsync to handle
sparse files efficiently. Without it, the non-existing blocks are
still allocated on the target, thus inflating the file.
According to Wikipedia, sparse file were only added in NFS version 4.2 (RFC 7862).
As your NFS version is 4.1, it most likely does not support sparse files. You will need to upgrade the NFS server to the required level.