Reason for the number 5381 in the DJB hash function?
Can anyone tell me why the number 5381 is used in the DJB hash function?
The DJB hash function is defined as:
-
h 0 = 5381
-
h i = 33h i - 1 + s i
Here's a C implementation:
unsigned int DJBHash(char* str, unsigned int len)
{
unsigned int hash = 5381;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash = ((hash << 5) + hash) + (*str);
}
return hash;
}
Solution 1:
I stumbled across a comment that sheds some light on what DJB is up to:
/*
* DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
*
* This is Daniel J. Bernstein's popular `times 33' hash function as
* posted by him years ago on comp.lang.c. It basically uses a function
* like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
* known hash functions for strings. Because it is both computed very
* fast and distributes very well.
*
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as RSE did now) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well. They
* all distribute in an acceptable way and this way fill a hash table
* with an average percent of approx. 86%.
*
* If one compares the Chi^2 values of the variants, the number 33 not
* even has the best value. But the number 33 and a few other equally
* good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
* advantage to the remaining numbers in the large set of possible
* multipliers: their multiply operation can be replaced by a faster
* operation based on just one shift plus either a single addition
* or subtraction operation. And because a hash function has to both
* distribute good _and_ has to be very fast to compute, those few
* numbers should be preferred and seems to be the reason why Daniel J.
* Bernstein also preferred it.
*
*
* -- Ralf S. Engelschall <[email protected]>
*/
That's a slightly different hash function than the one you're looking at, though it does use the 5381 magic number. The code below that comment at the link target has been unrolled.
Then I found this:
Magic Constant 5381: 1. odd number 2. prime number 3. deficient number 4. 001/010/100/000/101 b
There is also this answer to Can anybody explain the logic behind djb2 hash function? It references a post by DJB himself to a mailing list that mentions 5381 (excerpt from that answer excerpted here):
[...] practically any good multiplier works. I think you're worrying about the fact that 31c + d doesn't cover any reasonable range of hash values if c and d are between 0 and 255. That's why, when I discovered the 33 hash function and started using it in my compressors, I started with a hash value of 5381. I think you'll find that this does just as well as a 261 multiplier.
Solution 2:
5381 is just a number that, in testing, resulted in fewer collisions and better avalanching. You'll find "magic constants" in just about every hash algo.
Solution 3:
I found a very interesting property of this number may be that can be a reason for that.
5381 is 709th prime.
709 is 127th prime.
127 is 31st prime.
31 is 11th prime.
11 is 5th prime.
5 is 3rd prime.
3 is 2nd prime.
2 is 1st prime.
5381 is the first number for which this happens for 8 times. 5381st prime may exceed the limit of signed int so it is a good point to stop the chain.