Moving a Logical Volume directly from one server to another over the network?

I have a KVM host machine with several VMs on it. Each VM uses a Logical Volume on the host. I need to copy the LVs to another host machine.

Normally, I would use something like:

dd if=/the/logical-volume of=/some/path/machine.dd

To turn the LV into an image file and use SCP to move it. Then use DD to copy the file back to a new LV on the new host.

The problem with this method is you need twice as much disk space as the VM takes on both machines. ie. a 5GB LV uses 5GB of space for the LV and the dd copy also uses an additional 5GB of space for the image. This is fine for small LVs, but what if (as is my case) you have a 500GB LV for a big VM? The new host machine has a 1TB hard drive, so it can't hold a 500GB dd image file and have a 500GB logical volume to copy to and have room for the host OS and room for other smaller guests.

What I would like to do is something like:

dd if=/dev/mygroup-mylv of=

In other words, copy the data directly from one logical volume to the other over the network and skip the intermediate image file.

Is this possible?

Solution 1:

Sure, of course it's possible.

dd if=/dev/mygroup-mylv | ssh dd of=/dev/newvgroup-newlv


Do yourself a favor, though, and use something larger than the default blocksize. Maybe add bs=4M (read/write in chunks of 4 MB). You can see there's some nitpicking about blocksizes in the comments; if this is something you find yourself doing fairly often, take a little time to try it a few different times with different blocksizes and see for yourself what gets you the best transfer rates.

Answering one of the questions from the comments:

You can pipe the transfer through pv to get statistics about the transfer. It's a lot nicer than the output you get from sending signals to dd.

I will also say that while of course using netcat -- or anything else that does not impose the overhead of encryption -- is going to be more efficient, I usually find that the additional speed comes at some loss of convenience. Unless I'm moving around really large datasets, I usually stick with ssh despite the overhead because in most cases everything is already set up to Just Work.

Solution 2:

Here's an optimized version, which shows the progress using pv and uses BS for bigger chunks and also uses gzip to reduce the network traffic.

That's perfect when moving the data between slow connections like internet servers. I recommend to run the command inside a screen or tmux session. That way the ssh connection to the host from where you execute the command can be disconnected without trouble.

$ dd if=/dev/volumegroupname/logicalvolume bs=4096 | pv | gzip | \
    ssh [email protected] 'gzip -d | dd of=/dev/volumegroupname/logicalvolume  bs=4096'