How can I verify that a 1TB file transferred correctly?

You can use tee to do the sum on the fly with something like this (adapt the netcat commands for your needs):

Server:

netcat -l -w 2 1111 | tee >( md5sum > /dev/stderr )

Client:

tee >( md5sum > /dev/stderr ) | netcat 127.0.0.1 1111

Nerdwaller's answer about using tee to simultaneously transfer and calculate a checksum is a good approach if you're primarily worried about corruption over the network. It won't protect you against corruption on the way to disk, etc., though, as its taking the checksum before it hits disk.

But I'd like to add something:

1 TiB / 40 minutes ≈ 437 MiB/sec1.

That's pretty fast, actually. Remember that unless you have a lot of RAM, that's got to come back from storage. So the first thing to check is to watch iostat -kx 10 as you run your checksums; in particular you want to pay attention to the %util column. If you're pegging the disks (near 100%), then the answer is to buy faster storage.

Otherwise, as other posters mentioned, you can try different checksum algorithms. MD4, MD5, and SHA-1 are all designed to be cryptographic hashes (though none of those should be used for that purpose anymore; all are considered too weak). Speed wise, you can compare them with openssl speed md4 md5 sha1 sha256. I've thrown in SHA256 to have at least one still strong enough hash.

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md4              61716.74k   195224.79k   455472.73k   695089.49k   820035.58k
md5              46317.99k   140508.39k   320853.42k   473215.66k   539563.35k
sha1             43397.21k   126598.91k   283775.15k   392279.04k   473153.54k
sha256           33677.99k    75638.81k   128904.87k   155874.91k   167774.89k

Of the above, you can see that MD4 is the fastest, and SHA256 the slowest. This result is typical on PC-like hardware, at least.

If you want even more performance (at the cost of being trivial to tamper with, and also less likely to detect corruption), you want to look at a CRC or Adler hash. Of the two, Adler is typically faster, but weaker. Unfortunately, I'm not aware of any really fast command line implementations; the programs on my system are all slower than OpenSSL's md4.

So, your best bet speed-wise is openssl md4 -r (the -r makes it look like md5sum output).

If you're willing to do some compiling and/or minimal programming, see Mark Adler's code over on Stack Overflow and also xxhash. If you have SSE 4.2, you will not be able to beat the speed of the hardware CRC instruction.


1 1 TiB = 1024⁴ bytes; 1 MiB = 1024² bytes. Comes to ≈417MB/sec with powers-of-1000 units.


The openssl command supports several message digests. Of the ones I was able to try, md4 seems to run in about 65% of the time of md5, and about 54% of the time of sha1 (for the one file I tested with).

There's also an md2 in the documentation, but it seems to give the same results as md5.

Very roughly, speed seems to be inversely related to quality, but since you're (probably) not concerned about an adversary creating a deliberate collision, that shouldn't be much of an issue.

You might look around for older and simpler message digests (was there an md1, for example)?

A minor point: You've got a Useless Use of cat. Rather than:

cat foo.box | nc <archive IP> 1234

you can use:

nc <archive IP> 1234 < foo.box

or even:

< foo.box nc <archive IP> 1234

Doing so saves a process, but probably won't have any significant effect on performance.


Two options:

Use sha1sum

sha1sum foo.box

In some circumstances sha1sum is faster.


Use rsync

It will take longer to transfer, but rsync verifies that the file arrived intact.

From the rsync man page

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred...


Science is progressing. It appears that the new BLAKE2 hash function is faster than MD5 (and cryptographically much stronger to boot).

Reference: https://leastauthority.com/blog/BLAKE2-harder-better-faster-stronger-than-MD5.html

From Zooko's slides:

cycles per byte on Intel Core i5-3210M (Ivy Bridge)
function cycles per byte
long msg 4096 B 64 B MD5 5.0 5.2 13.1 SHA1 4.7 4.8 13.7 SHA256 12.8 13.0 30.0 Keccak 8.2 8.5 26.0 BLAKE1 5.8 6.0 14.9 BLAKE2 3.5 3.5 9.3