How many String objects will be created when using a plus sign?
How many String objects will be created when using a plus sign in the below code?
String result = "1" + "2" + "3" + "4";
If it was as below, I would have said three String objects: "1", "2", "12".
String result = "1" + "2";
I also know that String objects are cached in the String Intern Pool/Table for performance improvement, but that's not the question.
Solution 1:
Surprisingly, it depends.
If you do this in a method:
void Foo() {
String one = "1";
String two = "2";
String result = one + two + "34";
Console.Out.WriteLine(result);
}
then the compiler seems to emit the code using String.Concat
as @Joachim answered (+1 to him btw).
If you define them as constants, e.g.:
const String one = "1";
const String two = "2";
const String result = one + two + "34";
or as literals, as in the original question:
String result = "1" + "2" + "3" + "4";
then the compiler will optimize away those +
signs. It's equivalent to:
const String result = "1234";
Furthermore, the compiler will remove extraneous constant expressions, and only emit them if they are used or exposed. For instance, this program:
const String one = "1";
const String two = "1";
const String result = one + two + "34";
public static void main(string[] args) {
Console.Out.WriteLine(result);
}
Only generates one string- the constant result
(equal to "1234"). one
and two
do not show up in the resulting IL.
Keep in mind that there may be further optimizations at runtime. I'm just going by what IL is produced.
Finally, as regards interning, constants and literals are interned, but the value which is interned is the resulting constant value in the IL, not the literal. This means that you might get even fewer string objects than you expect, since multiple identically-defined constants or literals will actually be the same object! This is illustrated by the following:
public class Program
{
private const String one = "1";
private const String two = "2";
private const String RESULT = one + two + "34";
static String MakeIt()
{
return "1" + "2" + "3" + "4";
}
static void Main(string[] args)
{
string result = "1" + "2" + "34";
// Prints "True"
Console.Out.WriteLine(Object.ReferenceEquals(result, MakeIt()));
// Prints "True" also
Console.Out.WriteLine(Object.ReferenceEquals(result, RESULT));
Console.ReadKey();
}
}
In the case where Strings are concatenated in a loop (or otherwise dynamically), you end up with one extra string per concatenation. For instance, the following creates 12 string instances: 2 constants + 10 iterations, each resulting in a new String instance:
public class Program
{
static void Main(string[] args)
{
string result = "";
for (int i = 0; i < 10; i++)
result += "a";
Console.ReadKey();
}
}
But (also surprisingly), multiple consecutive concatenations are combined by the compiler into a single multi-string concatenation. For example, this program also only produces 12 string instances! This is because "Even if you use several + operators in one statement, the string content is copied only once."
public class Program
{
static void Main(string[] args)
{
string result = "";
for (int i = 0; i < 10; i++)
result += "a" + result;
Console.ReadKey();
}
}
Solution 2:
Chris Shain's answer is very good. As the person who wrote the string concatenation optimizer I would just add two additional interesting points.
The first is that the concatenation optimizer essentially ignores both parentheses and left associativity when it can do so safely. Suppose you have a method M() that returns a string. If you say:
string s = M() + "A" + "B";
then the compiler reasons that the addition operator is left associative, and therefore this is the same as:
string s = ((M() + "A") + "B");
But this:
string s = "C" + "D" + M();
is the same as
string s = (("C" + "D") + M());
so that is the concatenation of the constant string "CD"
with M()
.
In fact, the concatenation optimizer realizes that string concatenation is associative, and generates String.Concat(M(), "AB")
for the first example, even though that violates left associativity.
You can even do this:
string s = (M() + "E") + ("F" + M()));
and we'll still generate String.Concat(M(), "EF", M())
.
The second interesting point is that null and empty strings are optimized away. So if you do this:
string s = (M() + "") + (null + M());
you'll get String.Concat(M(), M())
An interesting question then is raised: what about this?
string s = M() + null;
We cannot optimize that down to
string s = M();
because M()
might return null, but String.Concat(M(), null)
would return an empty string if M()
returns null. So what we do is instead reduce
string s = M() + null;
to
string s = M() ?? "";
Thereby demonstrating that string concatenation need not actually call String.Concat
at all.
For further reading on this subject, see
Why is String.Concat not optimized to StringBuilder.Append?
Solution 3:
I found the answer at MSDN. One.
How to: Concatenate Multiple Strings (C# Programming Guide)
Concatenation is the process of appending one string to the end of another string. When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs. However, string variables can be concatenated only at run time. In this case, you should understand the performance implications of the various approaches.
Solution 4:
Just one. The C# compiler will fold string constants and hence it essentially compiles down to
String result = "1234";
Solution 5:
I doubt this is mandated by any standard or spec. One version can likely do something different from another.