Can Java's hashCode produce same value for different strings?

Solution 1:

A Java hash code is 32bits. The number of possible strings it hashes is infinite.

So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.

Solution 2:

YES. A lot.

Look at following pair

  • "FB" and "Ea"

can return same hash code even though the characters in it are not same.

Basically it is the sum of characters in a string multiplied by an integer.

Solution 3:

Yes, it is entirely possible. The probability of a string (or some other object type -- just assuming you'll be using strings in this example) having the same hashcode as some other string in a collection, depends on the size of that collection (assuming that all strings in that collection are unique). The probabilities are distributed as follows:

  • With a set of size ~9,000, you'll have a 1% chance of two strings colliding with a hash in the set
  • With a set of size ~30,000, you'll have a 10% chance of two strings colliding with a hash in the set
  • With a set of size ~77,000, you'll have a 50% chance of two strings colliding with a hash in the set

The assumptions made, are:

  • The hashCode function has no bias
  • Each string in the aforementioned set is unique

This site explains it clearly: http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/ (Look at "the second thing you should know")