Good block size for disk-cloning with diskdump (dd)

I use dd in its simplest form to clone a hard drive:

dd if=INPUT of=OUTPUT

However, I read in the manpage that dd knows a blocksize parameter. Is there an optimal value for the blocksize parameter that will speed up the cloning procedure?


Solution 1:

64k seems to be a good pick:

Results:

  no bs=        78s     144584+0 records
  bs=512        78s     144584+0 records
  bs=1k         38s     72292+0 records
  bs=2k         38s     36146+0 records
  bs=4k         38s     18073+0 records
  bs=5k         39s     14458+1 records
  bs=50k        38s     1445+1 records
  bs=500k       39s     144+1 records
  bs=512k       39s     144+1 records
  bs=1M         39s     72+1 records
  bs=5M         39s     14+1 records
  bs=10M        39s     7+1 records

(taken from here).

this matches with my own findings regarding read/write buffering for speeding up an io-heavy converter-program i was once pimping @work.

Solution 2:

dd will happily copy using the BS of whatever you want, and will copy a partial block (at the end).

Basically, the block size (bs) parameter seems to set the amount of memory thats used to read in a lump from one disk before trying to write that lump to the other.

If you have lots of RAM, then making the BS large (but entirely contained in RAM) means that the I/O sub-system is utilised as much as possible by doing massively large reads and writes - exploiting the RAM. Making the BS small means that the I/O overhead as a proportion of total activity goes up.

Of course in this there is a law of diminishing returns. My rough approximation is that a block size in the range about 128K to 32M is probably going to give performance such that the overheads are small compared to the plain I/O, and going larger won't make a lot of difference. The reason for the lower bound being 128K to 32M is - it depends on your OS, hardware, and so on.

If it were me, I'd do a few experiments timing a copy/clone using a BS of 128K and again using (say) 16M. If one is appreciably faster, use it. If not, then use the smaller BS of the two.

Solution 3:

For those that end up here via Google, even if this discussion is a bit old...

Keep in mind that dd is dumb for a reason: the simpler it is, the fewer ways it can screw up.

Complex partitioning schemes (consider a dual-boot hard drive that additionally uses LVM for its Linux system) will start pulling bugs out of the woodwork in programs like Clonezilla. Badly-unmounted filesystems can blow ntfsclone sky-high.

A corrupt filesystem cloned sector-by-sector is no worse than the original. A corrupt filesystem after a failed "smart copy" may be in REALLY sorry shape.

When in doubt, use dd and go forensic. Forensic imaging requires sector-by-sector copies (in fact, it can require more sectors than you're going to be able to pull off with dd, but that's a long story). It is slow and tedious but it will get the job done correctly.

Also, get to know the conv=noerror,sync options, so that you can clone drives that are starting to fail - or make ISOs from scratched (cough) CDs without it taking months.