the timing of String Literal loaded into StringTable in Java HotSpot vm

The Question came out when i was learning java.lang.String Java API.

I found an article in Chinese. Java 中new String("字面量") 中 "字面量" 是何时进入字符串常量池的?

it said,CONSTANT_String is lazy resolve in HotSpot VM, so String Literal is loaded into StringTable util it is used.

And i found some relavant saying.

jvms Chapter 5.4. Linking says

For example, a Java Virtual Machine implementation may choose to resolve each symbolic reference in a class or interface individually when it is used ("lazy" or "late" resolution), or to resolve them all at once when the class is being verified ("eager" or "static" resolution).

I found some openjdk code about ldc

IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide))  
  // access constant pool  
  constantPoolOop pool = method(thread)->constants();  
  int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) :get_index_u1(thread, Bytecodes::_ldc);  
  constantTag tag = pool->tag_at(index);  

  if (tag.is_unresolved_klass() || tag.is_klass()) {  
    klassOop klass = pool->klass_at(index, CHECK);  
    oop java_class = klass->java_mirror();  
    thread->set_vm_result(java_class);  
  } else {  
#ifdef ASSERT  
    // If we entered this runtime routine, we believed the tag contained  
    // an unresolved string, an unresolved class or a resolved class.  
    // However, another thread could have resolved the unresolved string  
    // or class by the time we go there.  
    assert(tag.is_unresolved_string()|| tag.is_string(), "expected string");  
#endif  
    oop s_oop = pool->string_at(index, CHECK);  
    thread->set_vm_result(s_oop);  
  }  
IRT_END  

and code about pool->string_at(index, CHECK)

oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) {  
  oop str = NULL;  
  CPSlot entry = this_oop->slot_at(which);  
  if (entry.is_metadata()) {  
    ObjectLocker ol(this_oop, THREAD);  
    if (this_oop->tag_at(which).is_unresolved_string()) {  
      // Intern string  
      Symbol* sym = this_oop->unresolved_string_at(which);  
      str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL)));  
      this_oop->string_at_put(which, str);  
   } else {  
      // Another thread beat us and interned string, read string from constant pool  
     str = this_oop->resolved_string_at(which);  
    }  
  } else {  
    str = entry.get_oop();  
  }  
  assert(java_lang_String::is_instance(str), "must be string");  
  return str;  
}  

But

those code only could prove String Literal maybe loaded into StringTable util ldc, but can not prove lazy resolve in HotSpot VM.

Could someone explicate it explicitly.

FYI, i know little c but not c++.

Thanks.!


Solution 1:

There is a corner case which allows to check within a Java application whether a string existed in the pool prior to the test, but it can be done only once per string. Together with string literals of the same content, the lazy loading can be detected:

public class Test {
    public static void main(String[] args) {
        test('h', 'e', 'l', 'l', 'o');
        test('m', 'a', 'i', 'n');
    }
    static void test(char... arg) {
        String s1 = new String(arg), s2 = s1.intern();
        System.out.println('"'+s1+'"'
            +(s1!=s2? " existed": " did not exist")+" in the pool before");
        System.out.println("is the same as \"hello\": "+(s2=="hello"));
        System.out.println("is the same as \"main\": "+(s2=="main"));
        System.out.println();
    }
}

The test first creates a new string instance which does not exist in the pool. Then it calls intern() on it and compares the references. There are three possible scenarios:

  1. If a string of the same contents exists in the pool, that string will be returned which must be a different object than our string not being in the pool.

  2. Our string is added to the pool and returned. In this case, the two references are identical.

  3. A new string with the same contents will be created and added to the pool. Then, the returned reference will be different.

We can’t distinguish between 1 and 3, so if a JVM generally adds new strings to the pool in intern(), we are out of luck. But if it adds the instance we’re calling intern() on, we can identify scenario 2 and know for sure that the string wasn’t in the pool, but has been added as a side effect of our test.

On my machine, it prints:

"hello" did not exist before
is the same as "hello": true
is the same as "main": false

"main" existed before
is the same as "hello": false
is the same as "main": true

Also on Ideone

showing that "hello" did not exist when entering the test method the first time, despite there is a string literal "hello" in the code later-on. So this proves that the string literal is resolved lazily. Since we already added a hello string manually, the string literal with the same contents will resolve to the same instance.

In contrast, the "main" string already exists in the pool, which is easy to explain. The Java launcher searches for the main method to execute, hence, adds that string to the pool as a side effect.

If we swap the order of the tests to test('m', 'a', 'i', 'n'); test('h', 'e', 'l', 'l', 'o'); the "hello" string literal will be used in the first test invocation and remains in the pool, so when we test it in the second invocation the string will already exist.