Given that HashMaps in jdk1.6 and above cause problems with multi=threading, how should I fix my code
I am the original author of the patch which appeared in 7u6, CR#7118743 : Alternative Hashing for String with Hash-based Maps.
I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?
Regardless, we will probably investigate switching to ThreadLocalRandom rather than Random and possibly some variant of lazy initialization as suggested by cambecc.
EDIT 3
A fix for the bottleneck was been pushed to the JDK7 update mercurial repo:
http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/b03bbdef3a88
The fix will be part of the upcoming 7u40 release and is already available in IcedTea 2.4 releases.
Near final test builds of 7u40 are available here:
https://jdk7.java.net/download.html
Feedback is still welcomed. Send it to http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev to be sure it gets seen by the openJDK devs.
This looks like a "bug" you can work around. There is a property that disables the new "alternative hashing" feature:
jdk.map.althashing.threshold = -1
However, disabling alternative hashing is not sufficient because it does not turn off the generation of a random hash seed (though it really should). So even if you turn off alt hashing, you still have thread contention during hash map instantiation.
One particularly nasty way of working around this is to forcefully replace the instance of Random
used for hash seed generation with your own non-synchronized version:
// Create an instance of "Random" having no thread synchronization.
Random alwaysOne = new Random() {
@Override
protected int next(int bits) {
return 1;
}
};
// Get a handle to the static final field sun.misc.Hashing.Holder.SEED_MAKER
Class<?> clazz = Class.forName("sun.misc.Hashing$Holder");
Field field = clazz.getDeclaredField("SEED_MAKER");
field.setAccessible(true);
// Convince Java the field is not final.
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);
// Set our custom instance of Random into the field.
field.set(null, alwaysOne);
Why is it (probably) safe to do this? Because alt hashing has been disabled, causing the random hash seeds to be ignored. So it doesn't matter that our instance of Random
isn't in fact random. As always with nasty hacks like this, please use with caution.
(Thanks to https://stackoverflow.com/a/3301720/1899721 for the code that sets static final fields).
--- Edit ---
FWIW, the following change to HashMap
would eliminate the thread contention when alt hashing is disabled:
- transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
+ transient final int hashSeed;
...
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
+ hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0;
init();
A similar approach can be used for ConcurrentHashMap
, etc.
There are lots of apps out there that create a transient HashMap per record in big data applications. This parsers and serializers, for example. Putting any synchronization into unsynchronized collections classes is a real gotcha. In my opinion, this is unacceptable and needs to be fixed ASAP. The change that was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without requiring any synchronization or atomic operation.
Somehow this is reminds me of the colossal mistake of making StringBuffer and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for years for that mistake. No need to repeat that experience.
Assuming your usage pattern is reasonable, you'll want to use your own version of Hashmap.
That piece of code is there to make hash collisions lot harder to cause, preventing attackers to create performance problems (details) - assuming this problem is already dealt with in some other way, I don't think you'd need synchronization at all. However, irrelevant of if you use synchronization or not, it seems you would want to use your own version of Hashmap so you wouldn't depent that much on what JDK happens to provide.
So either you just normally write something similar and point to that, or override a class in JDK. To do the latter, you can override bootstrap classpath with -Xbootclasspath/p:
parameter. Doing so will however "contravene the Java 2 Runtime Environment binary code license" (source).