Can Java's hashCode produce same value for different strings?
Solution 1:
A Java hash code is 32bits. The number of possible strings it hashes is infinite.
So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.
Solution 2:
YES. A lot.
Look at following pair
- "FB" and "Ea"
can return same hash code even though the characters in it are not same.
Basically it is the sum of characters in a string multiplied by an integer.
Solution 3:
Yes, it is entirely possible. The probability of a string (or some other object type -- just assuming you'll be using strings in this example) having the same hashcode as some other string in a collection, depends on the size of that collection (assuming that all strings in that collection are unique). The probabilities are distributed as follows:
- With a set of size ~9,000, you'll have a 1% chance of two strings colliding with a hash in the set
- With a set of size ~30,000, you'll have a 10% chance of two strings colliding with a hash in the set
- With a set of size ~77,000, you'll have a 50% chance of two strings colliding with a hash in the set
The assumptions made, are:
- The hashCode function has no bias
- Each string in the aforementioned set is unique
This site explains it clearly: http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/ (Look at "the second thing you should know")