Why does appending "" to a String save memory?

Doing the following:

data.substring(x, y) + ""

creates a new (smaller) String object, and throws away the reference to the String created by substring(), thus enabling garbage collection of this.

The important thing to realise is that substring() gives a window onto an existing String - or rather, the character array underlying the original String. Hence it will consume the same memory as the original String. This can be advantageous in some circumstances, but problematic if you want to get a substring and dispose of the original String (as you've found out).

Take a look at the substring() method in the JDK String source for more info.

EDIT: To answer your supplementary question, constructing a new String from the substring will reduce your memory consumption, provided you bin any references to the original String.

NOTE (Jan 2013). The above behaviour has changed in Java 7u6. The flyweight pattern is no longer used and substring() will work as you would expect.


If you look at the source of substring(int, int), you'll see that it returns:

new String(offset + beginIndex, endIndex - beginIndex, value);

where value is the original char[]. So you get a new String but with the same underlying char[].

When you do, data.substring() + "", you get a new String with a new underlying char[].

Actually, your use case is the only situation where you should use the String(String) constructor:

String tiny = new String(huge.substring(12,18));

When you use substring, it doesn't actually create a new string. It still refers to your original string, with an offset and size constraint.

So, to allow your original string to be collected, you need to create a new string (using new String, or what you've got).


I think this.smallpart kept referencing towards data, but why?

Because Java strings consist of a char array, a start offset and a length (and a cached hashCode). Some String operations like substring() create a new String object that shares the original's char array and simply has different offset and/or length fields. This works because the char array of a String is never modified once it has been created.

This can save memory when many substrings refer to the same basic string without replicating overlapping parts. As you have noticed, in some situations, it can keep data that's not needed anymore from being garbage collected.

The "correct" way to fix this is the new String(String) constructor, i.e.

this.smallpart = new String(data.substring(12,18));

BTW, the overall best solution would be to avoid having very large Strings in the first place, and processing any input in smaller chunks, aa few KB at a time.