how does hashing in java works?

I am trying to figure something out about hashing in java. If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues? Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.


Solution 1:

HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.

To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.

Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.

Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.

The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.

Hope this helps with the internals of how it works :)

Solution 2:

Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.

A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.

Solution 3:

If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?

A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.

Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.

I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)

A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:

  hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)

where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.

When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:

 1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
 2. o2.equals(o2) == o2.equals(o1)
 3. The hashcode of an object doesn't change while it is a key in a hash table.

It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.


"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"

The hash values are typically not stored. Rather they are calculated as required.

In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.