Clone only space in use from hard disk
Can I use dd, rsync, clonezilla or any tool to clone only space in use in my hard disk in Linux? I need to do a backup from a 1 TB HD (with only 2 GB space in use) into a 500 GB HD.
Solution 1:
You can, but you should prepare your disk first. The trick is to use sparse file or compression. This method is time consuming, it generates high I/O. In your case (2GB in use on 1 TB HDD) a file copy (as suggested in sawdust's comment) will probably be a way better solution. If – on the other hand – you had e.g. 850 GB in use out of 1 TB, many small files therein, you wanted to backup MBR, partition table, metadata, all that at once – then my method would be a reasonable way to save at least 150 GB on the image file (which still couldn't fit into 500 GB HDD, unless the data compressed well enough).
I'm writing this for users with higher disk usage. Also note that the source drive should be healthy and allow to overwrite the empty space. I'm giving the solution mainly for backup, not recovery nor forensics. The time and I/O cost will be paid not only during image creation but also when (if) the image is written back to disk. Think twice if the method is right for you.
Let's say you need to clone /dev/sdb
and there are several partitions: /dev/sdb1
, /dev/sdb2
…
Preparation
To take high advantage of sparse files or compression you should overwrite the empty space with zeros. In case of Windows partition there may be some trouble due to Windows hibernation, read this.
## Most commands need sudo.
mount -o rw /dev/sdb1 /mnt
dd if=/dev/zero of=/mnt/zero_file bs=32M
## Long wait here. Expect the following outcome: (which means that all empty space was zeroed)
### dd: error writing '/mnt/zero_file': No space left on device
sync
rm /mnt/zero_file
umount /dev/sdb1
## Repeat this with /dev/sdb2, /dev/sdb3 etc.
If there are major gaps in the partition layout then you should also fill them up with zeros. Swap partitions (if any) need special treatment in order to make the resulting image as small as possible. The Windows files like hiberfil.sys
, pagefile.sys
and swapfile.sys
may be removed before zero_file
creation. I won't cover these cases in detail here.
Sparse file method
This method may be used if the target filesystem (where the image file will be saved) supports sparse files. To generate a sparse image file, invoke:
## dd probably needs sudo here.
dd if=/dev/sdb of=/foo/bar/my_image.dd bs=512 conv=sparse
(EDIT: originally there was bs=32M
but it's not the good choice with conv=sparse
. Compare this question.)
To write the image back:
## dd probably needs sudo here.
dd if=/foo/bar/my_image.dd of=/dev/sdb bs=32M
Advantages:
- The image may be mounted (
mount -o offset=…
or usekpartx
) to access the files within.
Disadvantages:
- Target filesystem must support sparse files.
- You should remember to keep it sparse while copying (
cp --sparse=always
).
Compressed file method
To generate the image:
## dd probably needs sudo here.
dd if=/dev/sdb bs=32M | gzip -c > /foo/bar/my_image.dd.gz
To write the image back:
## dd probably needs sudo here.
gzip -cd < /foo/bar/my_image.dd.gz | dd of=/dev/sdb bs=32M
These commands might be built without dd
, with gzip
only. I used dd
to ensure 32 MiB buffer.
Advantages:
- The resulting file is non-sparse, it needs no special treatment.
- The image size will be reduced even more if the files on your source disk are prone to compression.
Disadvantages:
- It is hard to access the files within the compressed image without full decompression (some FUSE may be useful, although I'm not sure, never tried; consider a squashfs approach).
Hints
-
Long after I wrote the first version of this answer I learnt there is
virt-sparsify
tool. It looks useful. -
To compress fast use
gzip --fast
, to compress best usegzip --best
. Refer toman gzip
for more options. -
Use
pigz
instead ofgzip
if you can. This should speed things up, becausepigz
can utilize more than one processor core. You can use another compressor if you like. -
To monitor the progress invoke
dd
withstatus=progress
operand. Ifdd
is already running without it (e.g. yourdd
doesn't supportstatus=progress
or you forgot to use it), sendUSR1
signal to the tool (this doesn't kill the runningdd
command):kill -s USR1 $(pidof dd)
and repeat as needed.
-
As an alternative to
dd
you may usepv
to read. Examples:pv -B 32m /dev/sdb | dd of=/foo/bar/my_image.dd bs=512 conv=sparse pv -B 32m /dev/sdb | gzip -c > /foo/bar/my_image.dd.gz
Solution 2:
If the target disk is already formatted, the second disk is plugged into the same machine as the first, is mounted, and if you're running Linux or Mac:
rsync -avP --ignore=/media/disk2 / /media/disk2
If the target disk is already formatted, the second disk is formatted and mounted into another PC, and if you're running Linux or Mac:
rsync -avP / user@ip_of_disk2_host:/media/disk2
This assumes you're just wanting a backup of the files without regard to the underlying drive. This does a PER FILE backup and will run rather quickly on only 2 GB of data.