Why is MD5 still used heavily?

Solution 1:

It's quick to generate, and often the fact that collisions are theoretically possible isn't a massive problem. i.e. checking whether a cached file has changed in order to avoid downloading a new copy.

A quick benchmark done in 1996 shows the following:

            Digest Performance in MegaBytes per Second

      Pentium P5     Power Mac    SPARC 4     DEC Alpha
          90 MHz        80 MHz      110 MHz      200 MHz

MD5         13.1          3.1         5.1          8.5
SHA1         2.5          1.2         2.0          3.3

For a modern use - on embedded chips, MD5 can be 2-3x faster to produce than the SHA1 for the same information.

Solution 2:

A MD5 hash is "good enough" for most menial tasks. Recall that it's still incredibly difficult to produce meaningful collisions in the same number of bytes.

For instance, say you download the new Ubuntu 9.10 next week from a trusted mirror. You want to verify that the file was downloaded correctly and completely. Simply fire up MD5 and hash the ISO. Compare the hash against the published hash. If the hashes match, you can be sure that the ISO was copied correctly and completely.

Solution 3:

  1. It is short - easier to read.
  2. It is widespread - great interoperability with other systems
  3. It is usual - everyone is just used to it.

and security can be improved with salting it.

Solution 4:

MD5 is widely used as a checksum hash function because its fast and presents a extremely low collision ratio. An MD5 checksum is composed of 32 hexadecimal digits which together provide a 1 in ~3.42e34 odds of a collision. You could theoretically hash all the files in all computers in a country the size of the USA and not produce a collision(*).

For cryptography, MD5 is a valid alternative if security is only a moderate concern. It's a very viable option for hashing database passwords or other fields requiring internal security for its speed mostly, but also because MD5 does offer a reasonable level of security where strong encryption is not a concern.


(*) for most checksum purposes, a collision is only meaningful if it happens between two objects of similar origins and with the same size. Despite an MD5 high uniqueness probability, collisions could eventually occur between two very distinct files. Say, a 1.5Mb database file and a 35k gif file. For most purposes, this is a meaningless collision. Even more so because MD5 is just one element of file indexing. File size being another important one.

Solution 5:

MD5 is widely used because it has been widely used, and the breaks are not yet significant enough to be an obvious problem in existing systems.