How to prove that HashMap in java is not thread-safe

I'm working on an application, that has uses a HashMap to share state. I need to prove via unit tests that it will have problems in a multi-threaded environment.

I tried to check the state of the application in a single thread environment and in a multi-threaded environment via checking the size and elements of the HashMap in both of them. But seems this doesn't help, the state is always the same.

Are there any other ways to prove it or prove that an application that performs operations on the map works well with concurrent requests?


Solution 1:

This is quite easy to prove.

Shortly

A hash map is based on an array, where each item represents a bucket. As more keys are added, the buckets grow and at a certain threshold the array is recreated with a bigger size so that its buckets are spread more evenly (performance considerations). During the array recreation, the array becomes empty, which results in empty result for the caller, until the recreation completes.

Details and Proof

It means that sometimes HashMap#put() will internally call HashMap#resize() to make the underlying array bigger.

HashMap#resize() assigns the table field a new empty array with a bigger capacity and populates it with the old items. While this population happens, the underlying array doesn't contain all of the old items and calling HashMap#get() with an existing key may return null.

The following code demonstrates that. You are very likely to get the exception that will mean the HashMap is not thread safe. I chose the target key as 65 535 - this way it will be the last element in the array, thus being the last element during re-population which increases the possibility of getting null on HashMap#get() (to see why, see HashMap#put() implementation).

final Map<Integer, String> map = new HashMap<>();

final Integer targetKey = 0b1111_1111_1111_1111; // 65 535
final String targetValue = "v";
map.put(targetKey, targetValue);

new Thread(() -> {
    IntStream.range(0, targetKey).forEach(key -> map.put(key, "someValue"));
}).start();


while (true) {
    if (!targetValue.equals(map.get(targetKey))) {
        throw new RuntimeException("HashMap is not thread safe.");
    }
}

One thread adds new keys to the map. The other thread constantly checks the targetKey is present.

If count those exceptions, I get around 200 000.

Solution 2:

It is hard to simulate Race but looking at the OpenJDK source for put() method of HashMap:

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);

    //Operation 1       
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    } 

    //Operation 2
    modCount++;

    //Operation 3
    addEntry(hash, key, value, i);
    return null;
}

As you can see put() involves 3 operations which are not synchronized. And compound operations are non thread safe. So theoretically it is proven that HashMap is not thread safe.

Solution 3:

I need to prove via unit tests that it will have problems in multithread environment.

This is going to be tremendously hard to do. Race conditions are very hard to demonstrate. You could certainly write a program which does puts and gets into a HashMap in a large number of threads but logging, volatile fields, other locks, and other timing details of your application may make it extremely hard to force your particular code to fail.


Here's a stupid little HashMap failure test case. It fails because it times out when the threads go into an infinite loop because of memory corruption of HashMap. However, it may not fail for you depending on number of cores and other architecture details.

@Test(timeout = 10000)
public void runTest() throws Exception {
    final Map<Integer, String> map = new HashMap<Integer, String>();
    ExecutorService pool = Executors.newFixedThreadPool(10);
    for (int i = 0; i < 10; i++) {
        pool.submit(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i < 10000; i++) {
                    map.put(i, "wow");
                }
            }
        });
    }
    pool.shutdown();
    pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}

Solution 4:

Is reading the API docs enough? There is a statement in there:

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

The problem with thread safety is that it's hard to prove through a test. It could be fine most of the times. Your best bet would be to just run a bunch of threads that are getting/putting and you'll probably get some concurrency errors.

I suggest using a ConcurrentHashMap and trust that the Java team saying that HashMap is not synchronized is enough.

Solution 5:

Are there any other ways to prove it?

How about reading the documentation (and paying attention to the emphasized "must"):

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally

If you are going to attempt to write a unit test that demonstrates incorrect behavior, I recommend the following:

  • Create a bunch of keys that all have the same hashcode (say 30 or 40)
  • Add values to the map for each key
  • Spawn a separate thread for the key, which has an infinite loop that (1) asserts that the key is present int the map, (2) removes the mapping for that key, and (3) adds the mapping back.

If you're lucky, the assertion will fail at some point, because the linked list behind the hash bucket will be corrupted. If you're unlucky, it will appear that HashMap is indeed threadsafe despite the documentation.