rsync: transferred file twice as large as original file

I have transferred a file from a client to a remote host with rsync through a ssh bridge, and now the resulting file on the remote host is more than twice as large as the original file on the client.

9.1GB -> 20GB

I have checked the size with du -sh. This looks wrong. Given that these files sit in two different locations, and they are quite big, how can I check what went wrong?

client and remote host are two distinct flavours of Linux

output of ls -ls <file> in the two locations is:

client

 9528947 -rw-r--r-- 1 user1 group1 20420948104 Nov  2 13:45 filename.hdf5

remote host

 19942340 -rw-r--r--. 1 user2 group2 20420948104 Nov  2 14:45 filename.hdf5

EDIT

It looks like it may be Thin provisioning

https://fedoramagazine.org/copying-large-files-with-rsync-and-some-misconceptions/


EDIT 2

running

 rsync -avz -S <origin> <dest>

does not solve the problem.


EDIT 3

The filesystem on the remote host is:

df -Th -> nfs4


The most common cause for such a problem is that the source file is a sparse file:

a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space. The full block size is written to disk as the actual size only when the block contains "real" (non-empty) data.

You need the parameter -S or –sparse to tell rsync to handle sparse files efficiently. Without it, the non-existing blocks are still allocated on the target, thus inflating the file.


According to Wikipedia, sparse file were only added in NFS version 4.2 (RFC 7862).

As your NFS version is 4.1, it most likely does not support sparse files. You will need to upgrade the NFS server to the required level.