Can someone explain how BCrypt verifies a hash?

A BCrypt hash string looks like:

$2a$10$Ro0CUfOqk6cXEKf3dyaM7OhSCvnwM9s4wIX9JeLapehKK5YdLxKcm
\__/\/ \____________________/\_____________________________/
 |   |        Salt                     Hash
 |  Cost
Version

Where

  • 2a: Algorithm Identifier (BCrypt, UTF8 encoded password, null terminated)
  • 10: Cost Factor (210 = 1,024 rounds)
  • Ro0CUfOqk6cXEKf3dyaM7O: OpenBSD-Base64 encoded salt (22 characters, 16 bytes)
  • hSCvnwM9s4wIX9JeLapehKK5YdLxKcm: OpenBSD-Base64 encoded hash (31 characters, 24 bytes)

Edit: i just noticed these words fit exactly. i had to share:

$2a$10$TwentytwocharactersaltThirtyonecharacterspasswordhash
$==$==$======================-------------------------------

BCrypt does create a 24-byte binary hash, using 16-byte salt. You're free to store the binary hash and the salt however you like; nothing says you have to base-64 encode it into a string.

But BCrypt was created by guys who were working on OpenBSD. OpenBSD already defines a format for their password file:

$[HashAlgorithmIdentifier]$[AlgorithmSpecificData]

This means that the "bcrypt specification" is inexorably linked to the OpenBSD password file format. And whenever anyone creates a "bcrypt hash" they always convert it to an ISO-8859-1 string of the format:

$2a$[Cost]$[Base64Salt][Base64Hash]

A few important points:

  • 2a is the algorithm identifier

    • 1: MD5
    • 2: early bcrypt, which had confusion over which encoding passwords are in (obsolete)
    • 2a: current bcrypt, which specifies passwords as UTF-8 encoded
  • Cost is a cost factor used when computing the hash. The "current" value is 10, meaning the internal key setup goes through 1,024 rounds

    • 10: 210 = 1,024 iterations
    • 11: 211 = 2,048 iterations
    • 12: 212 = 4,096 iterations
  • the base64 algorithm used by the OpenBSD password file is not the same Base64 encoding that everybody else uses; they have their own:

      Regular Base64 Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
          BSD Base64 Alphabet: ./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
    

    So any implementations of bcrypt cannot use any built-in, or standard, base64 library


Armed with this knowledge, you can now verify a password correctbatteryhorsestapler against the saved hash:

$2a$12$mACnM5lzNigHMaf7O1py1O3vlf6.BA8k8x3IoJ.Tq3IB/2e7g61Km

BCrypt variants

There is a lot of confusion around the bcrypt versions.

$2$

BCrypt was designed by the OpenBSD people. It was designed to hash passwords for storage in the OpenBSD password file. Hashed passwords are stored with a prefix to identify the algorithm used. BCrypt got the prefix $2$.

This was in contrast to the other algorithm prefixes:

  • $1$: MD5
  • $5$: SHA-256
  • $6$: SHA-512

$2a$

The original BCrypt specification did not define how to handle non-ASCII characters, or how to handle a null terminator. The specification was revised to specify that when hashing strings:

  • the string must be UTF-8 encoded
  • the null terminator must be included

$2x$, $2y$ (June 2011)

A bug was discovered in crypt_blowfish🕗, a PHP implementation of BCrypt. It was mis-handling characters with the 8th bit set.

They suggested that system administrators update their existing password database, replacing $2a$ with $2x$, to indicate that those hashes are bad (and need to use the old broken algorithm). They also suggested the idea of having crypt_blowfish emit $2y$ for hashes generated by the fixed algorithm. Nobody else, including canonical OpenBSD, adopted the idea of 2x/2y. This version marker was was limited to crypt_blowfish🕗.

The versions $2x$ and $2y$ are not "better" or "stronger" than $2a$. They are remnants of one particular buggy implementation of BCrypt.

$2b$ (February 2014)

A bug was discovered in the OpenBSD implementation of BCrypt. They wrote their implementation in a language that doesn't have support strings - so they were faking it with a length-prefix, a pointer to a character, and then indexing that pointer with []. Unfortunately they were storing the length of their strings in an unsigned char. If a password was longer than 255 characters, it would overflow and wrap at 255. BCrypt was created for OpenBSD. When they have a bug in their library, they decided its ok to bump the version. This means that everyone else needs to follow suit if you want to remain current to "their" specification.

  • http://undeadly.org/cgi?action=article&sid=20140224132743 🕗
  • http://marc.info/?l=openbsd-misc&m=139320023202696 🕗

There is no difference between 2a, 2x, 2y, and 2b. If you wrote your implementation correctly, they all output the same result.

  • If you were doing the right thing from the beginning (storing strings in utf8 and also hashing the null terminator) then: there is no difference between 2, 2a, 2x, 2y, and 2b. If you wrote your implementation correctly, they all output the same result.
  • The version $2b$ is not "better" or "stronger" than $2a$. It is a remnant of one particular buggy implementation of BCrypt. But since BCrypt canonically belongs to OpenBSD, they get to change the version marker to whatever they want.
  • The versions $2x$ and $2y$ are not better, or even preferable, to anything. They are remnants of a buggy implementation - and should summarily forgotten.

The only people who need to care about 2x and 2y are those you may have been using crypt_blowfish back in 2011. And the only people who need to care about 2b are those who may have been running OpenBSD.

All other correct implementations are identical and correct.


How is BCrypt verifying the password with the hash if it's not saving the salt anywhere?

Clearly it is not doing any such thing. The salt has to be saved somewhere.

Let's look up password encryption schemes on Wikipedia. From http://en.wikipedia.org/wiki/Crypt_(Unix) :

The output of the function is not merely the hash: it is a text string which also encodes the salt and identifies the hash algorithm used.

Alternatively, an answer to your previous question on this subject included a link to the source code. The relevant section of the source code is:

    StringBuilder rs = new StringBuilder();
    rs.Append("$2");
    if (minor >= 'a') {
        rs.Append(minor);
    }
    rs.Append('$');
    if (rounds < 10) {
        rs.Append('0');
    }
    rs.Append(rounds);
    rs.Append('$');
    rs.Append(EncodeBase64(saltBytes, saltBytes.Length));
    rs.Append(EncodeBase64(hashed,(bf_crypt_ciphertext.Length * 4) - 1));
    return rs.ToString();

Clearly the returned string is version information, followed by the number of rounds used, followed by the salt encoded as base64, followed by the hash encoded as base64.