Good block size for disk-cloning with diskdump (dd)
I use dd in its simplest form to clone a hard drive:
dd if=INPUT of=OUTPUT
However, I read in the manpage that dd knows a blocksize parameter. Is there an optimal value for the blocksize parameter that will speed up the cloning procedure?
Solution 1:
64k seems to be a good pick:
Results:
no bs= 78s 144584+0 records
bs=512 78s 144584+0 records
bs=1k 38s 72292+0 records
bs=2k 38s 36146+0 records
bs=4k 38s 18073+0 records
bs=5k 39s 14458+1 records
bs=50k 38s 1445+1 records
bs=500k 39s 144+1 records
bs=512k 39s 144+1 records
bs=1M 39s 72+1 records
bs=5M 39s 14+1 records
bs=10M 39s 7+1 records
(taken from here).
this matches with my own findings regarding read/write buffering for speeding up an io-heavy converter-program i was once pimping @work.
Solution 2:
dd will happily copy using the BS of whatever you want, and will copy a partial block (at the end).
Basically, the block size (bs) parameter seems to set the amount of memory thats used to read in a lump from one disk before trying to write that lump to the other.
If you have lots of RAM, then making the BS large (but entirely contained in RAM) means that the I/O sub-system is utilised as much as possible by doing massively large reads and writes - exploiting the RAM. Making the BS small means that the I/O overhead as a proportion of total activity goes up.
Of course in this there is a law of diminishing returns. My rough approximation is that a block size in the range about 128K to 32M is probably going to give performance such that the overheads are small compared to the plain I/O, and going larger won't make a lot of difference. The reason for the lower bound being 128K to 32M is - it depends on your OS, hardware, and so on.
If it were me, I'd do a few experiments timing a copy/clone using a BS of 128K and again using (say) 16M. If one is appreciably faster, use it. If not, then use the smaller BS of the two.
Solution 3:
For those that end up here via Google, even if this discussion is a bit old...
Keep in mind that dd
is dumb for a reason: the simpler it is, the fewer ways it can screw up.
Complex partitioning schemes (consider a dual-boot hard drive that additionally uses LVM for its Linux system) will start pulling bugs out of the woodwork in programs like Clonezilla. Badly-unmounted filesystems can blow ntfsclone
sky-high.
A corrupt filesystem cloned sector-by-sector is no worse than the original. A corrupt filesystem after a failed "smart copy" may be in REALLY sorry shape.
When in doubt, use dd
and go forensic. Forensic imaging requires sector-by-sector copies (in fact, it can require more sectors than you're going to be able to pull off with dd
, but that's a long story). It is slow and tedious but it will get the job done correctly.
Also, get to know the conv=noerror,sync
options, so that you can clone drives that are starting to fail - or make ISOs from scratched (cough) CDs without it taking months.