How should I map long to int in hashCode()?

I have a range of objects that have a long field whose value uniquely identifies a particular object across my entire system, much like a GUID. I have overriden Object.equals() to use this id for comparison, beause I want it to work with copies of the object. Now I want to override Object.hashCode(), too, which basically means mapping my long to some int return value.

If I understood the purpose of hashCode correctly, it is mainly used in hash tables, so a uniform distribution would be desirable. This would mean, simply returning id % 2^32 would suffice. Is that all, or should I be aware of something else?


Since Java 8 you can use

Long.hashCode(guid);

For older versions of Java you can use the following:

Long.valueOf(guid).hashCode();

Note that this solution creates a new Object for the stack, while the first doesn't (although it is likely that Java optimizes the object creation away..)

Looking at the docs, both ways just use the following algorithm:

(int)(this.longValue()^(this.longValue()>>>32))

These are decent solutions since they make use of the Java library - always better to leverage off of something that has been tested already.


It's a bit of a minor thing if you're not using Guava already, but Guava can do this for you nicely:

public int hashCode() {
  return Longs.hashCode(id);
}

That gives you the equivalent of Long.valueOf(id).hashCode():

return (int) (value ^ (value >>> 32));

Additionally, if you were to have other values or objects that were part of the hashcode, you could just write

return Objects.hashCode(longValue, somethingElse, ...);

The long would be autoboxed into a Long so you'd get the correct hashcode for it as part of the overall hashcode.


You have understood the purpose of hashCode correctly. Yes, an uniform distribution is desirable (although not an actual requirement).

I would suggest ((id >> 32) ^ id).

The above expression:

  • Uses all bits of the original value, does not discard any information upfront. For example, depending on how you are generating the IDs, the upper bits could change more frequently (or the opposite).
  • Does not introduce any bias towards values with more ones (zeros), as it would be the case if the two halves were combined with an OR (AND) operation.