Do string literals get optimised by the compiler?
EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.
It's not just string literals. It's any string constant. So for example, consider:
public const string X = "X";
public const string Y = "Y";
public const string XY = "XY";
void Foo()
{
string z = X + Y;
}
The compiler realises that the concatenation here (for z
) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z
will be the same reference as the value of XY
, because they're compile-time constants with the same value.
EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.
This article explains string interning pretty well. Quote:
.NET has the concept of an "intern pool". It's basically just a set of strings, but it makes sure that every time you reference the same string literal, you get a reference to the same string. This is probably language-dependent, but it's certainly true in C# and VB.NET, and I'd be very surprised to see a language it didn't hold for, as IL makes it very easy to do (probably easier than failing to intern literals). As well as literals being automatically interned, you can intern strings manually with the Intern method, and check whether or not there is already an interned string with the same character sequence in the pool using the IsInterned method. This somewhat unintuitively returns a string rather than a boolean - if an equal string is in the pool, a reference to that string is returned. Otherwise, null is returned. Likewise, the Intern method returns a reference to an interned string - either the string you passed in if was already in the pool, or a newly created interned string, or an equal string which was already in the pool.
Yes it does optimize string literals. One simple example where you can see that:
string s1="A";
string s2="A";
object.ReferenceEquals(s1,s2); //true