MD5 vs CRC32: Which one's better for common use?

MD5 is a one-way-hash algorithm. One-way-hash algorithm are often used in cryptography as they have the property (per design) that it's hard to find the input that produced a specific hash value. Specifically it's hard to make two different inputs that gives the same one-way-hash. Those they are often used as a way to show that a amount of data have not been altered intentionally since the hash code was produced. As the MD5 is a one-way-hash algorithm the emphasis is on security over speed. Unfortunately MD5 is now considered insecure.

CRC32 is designed to detect accidental changes to data and are commonly used in networks and storage devices. The purpose of this algorithm is not to protect against intentionally changes , but rather to catch accidents like network errors and disk write errors etc. The emphasis of this algorithm is those more on speed than on security.


From Wikipedia's article on MD5 (emphasis mine):

MD5 is a widely used cryptographic hash function

Now CRC32:

CRC is an error-detecting code

So, as you can see, CRC32 is not a hashing algorithm. That means you should not use it for hashing, because it was not built for that.

And I think it doesn't make much sense to talk about common use, because similar algorithms are used for different purposes, each with significantly different requirements. There is no single algorithm that's best for common use, instead, you should choose the algorithm that's most suited for your specific use.


It depends on your goals. Here are some examples what can be done with CRC32 versus MD5:

Detecting duplicate files

If you want to check if two files are the same, CRC32 checksum is the way to go because it's faster than MD5. But be careful: CRC only reliably tells you if the binaries are different; it doesn't tell you if they're identical. If you get different hashes for two files, they cannot be the same file, so you can reject them as being duplicates very quickly.

No matter what your keys are, the CRC32 checksum will be one of 2^32 different values. Assuming random sample files, the probability of collision between the hashes of two given files is 1 / 2^32. The probability of collisions between any of N given files is (N - 1) / 2^32.

Detecting malicious software

If security is an issue, like downloading a file and checking the source's hash against yours to see if the binaries aren't corrupted, then CRC is a poor option. This is because attackers can make malware that will have the same CRC checksum. In this case, an MD5 digest is more secure -- CRC was not made for security. Two different binaries are far more likely to have the same CRC checksum than the same MD5 digest.

Securing passwords for user authentication

Synchronous (one-way) encryption is usually easier, faster, and more secure than asynchronous (two-way) encryption, so it's a common method to store passwords. Basically, the password will be combined with other data (salts) then the hash will be done on all of this combined data. Random salts greatly reduce the chances of two passwords being the same. By default, the same password will have the same hash for most algorithms, so you must add your own randomness. Of course, the salt must be saved externally.

To log a user in, you just take the information they give you when they log in. You use their username to get their salt from a database. You then combine this salt with the user's password to get a new hash. If it matches the one in in the database, then their login is successful. Since you're storing these passwords, they must be VERY secure, which means a CRC checksum is out of the question.

Cryptographic digests are more expensive to compute than CRC checksums. Also, better hashes like sha256 are more secure, but slower for hashing and take up more database space (their hashes are longer).


One big difference between CRC32 and MD5 is that it is usually easy to pick a CRC32 checksum and then come up with a message that hashes to that checksum, even if there are constraints imposed on the message, whereas MD5 is specifically designed to make this sort of thing difficult (although it is showing its age - this is now possible in some situations).

If you are in a situation where it is possible that an adversary might decide to sit down and create a load of messages with specified CRC32 hashes, to mimic other messages, or just to make a hash table perform very badly because everything hashes to the same value, then MD5 would be a better option. (Even better, IMHO, would be HMAC-MD5 with a keyed value that is unique to the module using it and unknown outside it).