intern() behaving differently in Java 6 and Java 7
class Test {
public static void main(String...args) {
String s1 = "Good";
s1 = s1 + "morning";
System.out.println(s1.intern());
String s2 = "Goodmorning";
if (s1 == s2) {
System.out.println("both are equal");
}
}
}
This code produces different outputs in Java 6 and Java 7.
In Java 6 the s1==s2
condition returns false
and in Java 7 the s1==s2
returns true
. Why?
Why does this program produces different output in Java 6 and Java 7?
It seems that JDK7 process intern in a different way as before.
I tested it with build 1.7.0-b147 and got "both are equal", but when executing it (same bytecode) with 1,6.0_24 I do not get the message.
It also depends where the String b2 =...
line is located in the source code. The following code also does not output the message:
class Test {
public static void main(String... args) {
String s1 = "Good";
s1 = s1 + "morning";
String s2 = "Goodmorning";
System.out.println(s1.intern()); //just changed here s1.intern() and the if condition runs true
if(s1 == s2) {
System.out.println("both are equal");
} //now it works.
}
}
it seems like intern
after not finding the String in its pool of strings, inserts the actual instance s1 into the pool. The JVM is using that pool when s2 is created, so it gets the same reference as s1 back. On the other side, if s2 is created first, that reference is stored into the pool.
This can be a result of moving the interned Strings out from the permanent generation of the Java heap.
Found here: Important RFEs Addressed in JDK 7
In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Not sure if that is a bug and from which version... The JLS 3.10.5 states
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
so the question is how pre-existing is interpreted, compile-time or execute-time: is "Goodmorning" pre-existing or not?
I prefer the way it WAS implemented before 7...
Let's omit unnecessary details from the example:
class Test {
public static void main(String... args) {
String s1 = "Good";
s1 = s1 + "morning";
System.out.println(s1 == s1.intern()); // Prints true for jdk7, false - for jdk6.
}
}
Let's consider String#intern
as a black box. Based on a few test cases run, I would conclude that implementation is as following:
Java 6:
if the pool contains object equals to this
, then return reference to that object,
else create new string (equal to this
), put to the pool, and return reference to that created instance.
Java 7:
if the pool contains object equals to this
, then return reference to that object,
else put this
to the pool, and return this
.
Neither Java 6 nor Java 7 breaks the contract of the method.
It seems that new intern method behavior was a result of the fix of this bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931.
==
compares the references. The intern method makes sure strings with the same value have the same reference.
The javadoc for the String.intern method explains:
public String intern()
Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification
Returns: a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings.
So without interning the compiler looks at the constants in the java code and builds its constant pool from that. There is a different pool maintained by the String class, and interning checks the string passed in against the pool and makes sure the reference is unique (so that == will work).
In jdk6:
String s1="Good";
creates a String object "Good" in constant pool.
s1=s1+"morning";
creates another String object "morning" in constant pool but this time actually JVM do: s1=new StringBuffer().append(s1).append("morning").toString();
.
Now as the new
operator creates an object in heap therefore the reference in s1
is of heap not constant pool
and the String s2="Goodmorning";
creates a String object "Goodmorning" in constant pool whose reference is stored in s2
.
Therefore, if(s1==s2)
condition is false.
But what happens in jdk7?
FIRST CASE:
In the first code snipped you are actually adding three Strings in the Pool of Strings.
1. s1 = "Good"
2. s1 = "Goodmorning" (after concatenating)
3. s2 = "Goodmorining"
While doing if(s1==s2), the objects are same but reference as different hence it is false.
SECOND CASE:
In this case you are using s1.intern(), which implies that if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
- s1 = "Good"
- s1 = "Goodmorning" (after concatenating)
- For String s2="Goodmorning", new String is not added to the pool and you get reference of existing one for s2. Hence if(s1==s2) returns true.