Empty string as a special case?

Solution 1:

Here is a blog post by Eric Lippert which answers your question: String interning and String.Empty.

He's describing similar situation:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // false !?

So the idea is, that interning does not mean you'll have only one instance of particular string, even when it's interned. Only compile time literals are interned by default. It means that following code prints true:

var k1 = "k";
object k2 = "k";
Console.WriteLine(k1 == k2);

But, if you try to create string with "k" content programmatically at runtime, e.g. using string(char[]) constructor, calling ToString() on an object, using StringBuilder, etc, you won't get interned string by default. This one prints false;

var k1 = "k";
object k2 = new string("k".ToCharArray());
Console.WriteLine(k1 == k2);

Why? Because interning strings at runtime is expensive.

There Ain't No Such Thing As A Free Lunch.

(...)

In short, it is in the general case not worth it to intern all strings.

And about different behavior with empty string:

Some versions of the .NET runtime automatically intern the empty string at runtime, some do not!

Solution 2:

Note that interning the new strings in the second block of code does make them equal.

var k="k";
object x = string.Intern(new string(k.ToArray()));
object y = string.Intern(new string(k.ToArray()));
Console.WriteLine(x == y); //true

It seems like it's interning the empty strings automatically, but non-empty strings aren't interned unless they're done explicitly (or they're literal strings which are always interned).

I'm guessing that yes, empty strings are being treated as a special case and being interned automatically, probably because the check is so trivial that it doesn't add any real performance penalty (we can safely say that ANY string of length 0 is the empty string and is identical to any other empty string -- all other strings require us to look at the characters and not just the length).

Solution 3:

The first case compares 2 references to the same object (String.Empty). Calling operator== for 2 object variables causes their comparance by reference and gives true.

The second case produces 2 different instances of string class. Their reference comparison gives false

If you give string type to x and y in the second case the string.operator== override will be called and the comparison gives true

Note that we don't deal with the string interning directly in both cases. The string objects which we compare are created using string(char[]) constructor. Apparently that constructor is designed to return the value of the string.Empty field when called with an empty array as an argument.

The answer posted by MarcinJuraszek referes to the Lippert's blog which discusses string interning. That blog post discusses other corner case of string class usage. Consider this example from the forementioned Lippert's blog:

object obj = "";
string str1 = "";
string str2 = String.Empty;
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // sometimes true, sometimes false?!

What we see here is that the assignment from the empty string literal ("") is not guaranteed to produce the reference to the static readonly System.String.Empty field.

Let's look at the IL for the object x = new string("".ToArray()); expression:

IL_0001:  ldstr      ""
IL_0006:  call       !!0[] [System.Core]System.Linq.Enumerable::ToArray<char>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000b:  newobj     instance void [mscorlib]System.String::.ctor(char[])
IL_0010:  stloc.0

The interning may (or may not) happen at the IL_0001 line. Whether the literal is interned or not, the ToArray() method produces a new empty array and the String::.ctor(char[]) gives us String.Empty.

What we see here is not the special case of string.Empty but rather is one of the side effects of the string class being reference type and immutable at the same time. There are other immutable framework types which have predefined values with similar semantics (like DateTime.MinValue). But as far as I know such framework types are defined as struct unlike the string which is a reference type. The value types are totally different story... It does not make sense to return some fixed predefined type instance from a mutable class constructor (the calling code will be able to change that instance and cause the unpredictable behavior of the type). So the reference types whose constructors do not always return new instances may exist provided that those types are immutable. I am not aware of other such types in the framework though, except the string.

Solution 4:

My hypothesis is why the first one yields true while the 2nd yields false:

The first result my be an optimization, take the following code code

Enumerable.Empty<char>() == Enumerable.Empty<char>() // true

So, suppose the ToArray method returns Enumerable.Empty<char>() when the string is empty, this would explain why the first result yields true and the 2nd doesn't, as it's doing a reference check.

Solution 5:

According to http://msdn.microsoft.com/en-us/library/system.string.intern(v=vs.110).aspx

In the .NET Framework 3.5 Service Pack 1, the Intern method reverts to its behavior in the .NET Framework 1.0 and 1.1 with regard to interning the empty string...

...In the .NET Framework 1.0, .NET Framework 1.1, and .NET Framework 3.5 SP1, ~empty strings~ are equal

This means, empty strings are both interned by default, even when constructing from an empty array, and are therefore equal.

Furthermore:

The .NET Framework version 2.0 introduces the CompilationRelaxations.NoStringInterning enumeration member

This most likely provides you a way to create a consistent way to compare, although as @BenM suggests, you would rather explicitly use the Intern function.

Given the boxing that occurs, you could also use string.Equals instead of ==