What is a good 64bit hash function in Java for textual strings?
Why don't you use a long
variant of the default String.hashCode()
(where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?
// adapted from String.hashCode()
public static long hash(String string) {
long h = 1125899906842597L; // prime
int len = string.length();
for (int i = 0; i < len; i++) {
h = 31*h + string.charAt(i);
}
return h;
}
If you're looking for even more bits, you could probably use a
Edit: BigInteger
As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:
I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.
To combine the hash for several fields, simply do an XOR multiply one with a prime and add them:
long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);
The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.
An answer for today (2018). SipHash.
It will be much faster than most of the answers here, and significantly higher quality than all of them.
The Guava library has one: https://google.github.io/guava/releases/23.0/api/docs/com/google/common/hash/Hashing.html#sipHash24--
Create an SHA-1 hash and then mask out the lowest 64bits.
long hash = string.hashCode();
Yes, the top 32 bits will be 0, but you will probably run out of hardware resources before you run into problems with hash collisions. The hashCode in String is quite efficient and well tested.
Update I think the above satisfies the simplest thing which could possibly work, however, I agree with @sfussenegger idea of extending the existing String hashCode.
In addition to having a good hashCode for your String, you may want to consider rehashing the hash code in your implementation. If your storage is used by other developers, or used with other types, this can help distributed your keys. For example, Java's HashMap is based on power-of-two length hash tables, so it adds this function to ensure the lower bits are sufficiently distributed.
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
Why not use a CRC64 polynomial. These are reasonably efficient and optimized to make sure all bits are counted and spread over the result space.
There are plenty of implementations available on the net if you google "CRC64 Java"