In C#, why is String a reference type that behaves like a value type?

A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.

Why isn't string just a value type then?


Solution 1:

Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...

(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)

Solution 2:

It is not a value type because performance (space and time!) would be terrible if it were a value type and its value had to be copied every time it were passed to and returned from methods, etc.

It has value semantics to keep the world sane. Can you imagine how difficult it would be to code if

string s = "hello";
string t = "hello";
bool b = (s == t);

set b to be false? Imagine how difficult coding just about any application would be.

Solution 3:

A string is a reference type with value semantics. This design is a tradeoff which allows certain performance optimizations.

The distinction between reference types and value types are basically a performance tradeoff in the design of the language. Reference types have some overhead on construction and destruction and garbage collection, because they are created on the heap. Value types on the other hand have overhead on assignments and method calls (if the data size is larger than a pointer), because the whole object is copied in memory rather than just a pointer. Because strings can be (and typically are) much larger than the size of a pointer, they are designed as reference types. Furthermore the size of a value type must be known at compile time, which is not always the case for strings.

But strings have value semantics which means they are immutable and compared by value (i.e. character by character for a string), not by comparing references. This allows certain optimizations:

Interning means that if multiple strings are known to be equal, the compiler can just use a single string, thereby saving memory. This optimization only works if strings are immutable, otherwise changing one string would have unpredictable results on other strings.

String literals (which are known at compile time) can be interned and stored in a special static area of memory by the compiler. This saves time at runtime since they don't need to be allocated and garbage collected.

Immutable strings does increase the cost for certain operations. For example you can't replace a single character in-place, you have to allocate a new string for any change. But this is a small cost compared to the benefit of the optimizations.

Value semantics effectively hides the distinction between reference type and value types for the user. If a type has value semantics, it doesn't matter for the user if the type is a value type or reference type - it can be considered an implementation detail.