Hash function that produces short hashes?

Is there a way of encryption that can take a string of any length and produce a sub-10-character hash? I want to produce reasonably unique ID's but based on message contents, rather than randomly.

I can live with constraining the messages to integer values, though, if arbitrary-length strings are impossible. However, the hash must not be similar for two consecutive integers, in that case.


You can use any commonly available hash algorithm (eg. SHA-1), which will give you a slightly longer result than what you need. Simply truncate the result to the desired length, which may be good enough.

For example, in Python:

>>> import hashlib
>>> hash = hashlib.sha1("my message".encode("UTF-8")).hexdigest()
>>> hash
'104ab42f1193c336aa2cf08a2c946d5c6fd0fcdb'
>>> hash[:10]
'104ab42f11'

If you don't need an algorithm that's strong against intentional modification, I've found an algorithm called adler32 that produces pretty short (~8 character) results. Choose it from the dropdown here to try it out:

http://www.sha1-online.com/


You need to hash the contents to come up with a digest. There are many hashes available but 10-characters is pretty small for the result set. Way back, people used CRC-32, which produces a 33-bit hash (basically 4 characters plus one bit). There is also CRC-64 which produces a 65-bit hash. MD5, which produces a 128-bit hash (16 bytes/characters) is considered broken for cryptographic purposes because two messages can be found which have the same hash. It should go without saying that any time you create a 16-byte digest out of an arbitrary length message you're going to end up with duplicates. The shorter the digest, the greater the risk of collisions.

However, your concern that the hash not be similar for two consecutive messages (whether integers or not) should be true with all hashes. Even a single bit change in the original message should produce a vastly different resulting digest.

So, using something like CRC-64 (and base-64'ing the result) should get you in the neighborhood you're looking for.


You can use the hashlib library for Python. The shake_128 and shake_256 algorithms provide variable length hashes. Here's some working code (Python3):

import hashlib
>>> my_string = 'hello shake'
>>> hashlib.shake_256(my_string.encode()).hexdigest(5)
'34177f6a0a'

Notice that with a length parameter x (5 in example) the function returns a hash value of length 2x.