Support for Compressed Strings being Dropped in HotSpot JVM?
On this Oracle page Java HotSpot VM Options, it lists -XX:+UseCompressedStrings
as being available and on by default. However in Java 6 update 29, it is off by default and in Java 7 update 2 it reports a warning
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseCompressedStrings; support was removed in 7.0
Does anyone know the thinking behind removing this option?
sorting lines of an enormous file.txt in java
With -mx2g
, this example took 4.541 seconds with the option on and 5.206 second with it off in Java 6 update 29. It is hard to see that it impacts performance.
Note: Java 7 update 2 requires 2.0 G whereas Java 6 update 29 without compressed strings requires 1.8 GB and with compressed string requires only 1.0 GB.
Solution 1:
Originally, this option was added to improve SPECjBB performance. The gains are due to reduced memory bandwidth requirements between the processor and DRAM. Loading and storing bytes in the byte[] consumes 1/2 the bandwidth versus chars in the char[].
However, this comes at a price. The code has to determine if the internal array is a byte[] or char[]. This takes CPU time and if the workload is not memory bandwidth constrained, it can cause a performance regression. There is also a code maintenance price due to the added complexity.
Because there weren't enough production-like workloads that showed significant gains (except perhaps SPECjBB), the option was removed.
There is another angle to this. The option reduces heap usage. For applicable Strings, it reduces the memory usage of those Strings by 1/2. This angle wasn't considered at the time of option removal. For workloads that are memory capacity constrained (i.e. have to run with limited heap space and GC takes a lot of time), this option can prove useful.
If enough memory capacity constrained production-like workloads can be found to justify the option's inclusion, then maybe the option will be brought back.
Edit 3/20/2013: An average server heap dump uses 25% of the space on Strings. Most Strings are compressible. If the option is reintroduced, it could save half of this space (e.g. ~12%)!
Edit 3/10/2016: A feature similar to compressed strings is coming back in JDK 9 JEP 254.
Solution 2:
Just to add, for those interested...
The java.lang.CharSequence interface (which java.lang.String
implements), allows more compact representations of Strings than UTF-16.
Apps which manipulate a lot of strings, should probably be written to accept CharSequence
, such that they would work with java.lang.String
, or more compact representations.
8-bit (UTF-8), or even 5, 6, or 7-bit encoded, or even compressed strings can be represented as CharSequence
.
CharSequence
s can also be a lot more efficient to manipulate - subsequences can be defined as views (pointers) onto the original content for example, instead of copying.
For example in concurrent-trees, a suffix tree of ten of Shakespeare's plays, requires 2GB of RAM using CharSequence
-based nodes, and would require 249GB of RAM if using char[] or String-based nodes.
Solution 3:
Since there were up votes, I figure I wasn't missing something obvious so I have logged it as a bug (at the very least an omission in the documentation)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7129417
(Should be visible in a couple of days)
Solution 4:
Java 9 executes the sorting lines of an enormous file.txt in java twice as fast on my machine as Java 6 and also only needs 1G of memory as it has -XX:+CompactStrings
enabled by default. Also, in Java 6, the compressed strings only worked for 7-bit ASCII characters, whereas in Java 9, it supports Latin1 (ISO-8859-1). Some operations like charAt(idx)
might be slightly slower though. With the new design, they could also support other encodings in future.
I wrote a newsletter about this on The Java Specialists' Newsletter.