Possible to calculate MD5 (or other) hash with buffered reads?

I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.

Something like this: (pseudocode-ish)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

hash is now sililar to what would be accomplished by running the calcHash function on the entire file.

Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.


Solution 1:

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks.

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process.

Solution 2:

I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream class. If you are already handling streams, it is easy to wrap your stream in a CryptoStream, passing a HashAlgorithm as the ICryptoTransform parameter.

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

You might have to close the stream before getting the hash (so the HashAlgorithm knows it's done).