When are Java Strings interned?
Inspired by the comments on this question, I'm pretty sure that Java String
s are interned at runtime rather than compile time - surely just the fact that classes can be compiled at different times, but would still point to the same reference at runtime.
I can't seem to find any evidence to back this up. Can anyone justify this?
The optimization happens (or at least can happen) in both places:
- If two references to the same string constant appear in the same class, I'd expect the class file to only contain one constant pool entry. This isn't strictly required in order to ensure that there's only one
String
object created in the JVM, but it's an obvious optimization to make. This isn't actually interning as such - just constant optimization. - When classes are loaded, the string pool for the class is added to the intern pool. This is "real" interning.
(I have a vague recollection that one of the bits of work for Java 7 around "small jar files" included a single string pool for the whole jar file... but I could be very wrong.)
EDIT: Section 5.1 of the JVM spec, "The Runtime Constant Pool" goes into details of this:
To derive a string literal, the Java virtual machine examines the sequence of characters given by the CONSTANT_String_info structure.
If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode characters identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.
Otherwise, a new instance of class String is created containing the sequence of Unicode characters given by the CONSTANT_String_info structure; that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.