How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 String was changed to use char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal IntPtr m_currentThread;
    internal int m_MaxCapacity;
    internal volatile string m_StringValue; // HERE ----------------------
    private const string MaxCapacityField = "m_MaxCapacity";
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal char[] m_ChunkChars; // HERE --------------------------------
    internal int m_ChunkLength;
    internal int m_ChunkOffset;
    internal StringBuilder m_ChunkPrevious;
    internal int m_MaxCapacity;
    private const string MaxCapacityField = "m_MaxCapacity";
    internal const int MaxChunkSize = 0x1f40;
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

The accepted answer misses the mark by a mile. The significant change to StringBuilder in 4.0 is not the change from an unsafe string to char[] - it's the fact that StringBuilder is now actually a linked-list of StringBuilder instances.

The reason for this change should be obvious: now there is never a need to reallocate the buffer (an expensive operation, since, along with allocating more memory, you also have to copy all the contents from the old buffer to the new one).

This means calling ToString() is now slightly slower, since the final string needs to be computed, but doing a large number of Append() operations is now significantly faster. This fits in with the typical use-case for StringBuilder: a lot of calls to Append(), followed by a single call to ToString().

You can find benchmarks here. The conclusion? The new linked-list StringBuilder uses marginally more memory, but is significantly faster for the typical use-case.

Not really - it uses internal character buffer. Only when buffer capacity gets exhausted, it will allocate new buffer. Append operation will simply add to this buffer, string object will be created when ToString() method is called on it - henceforth, its advisable for many string concatenations as each traditional string concat op would create new string. You can also specify initial capacity to string builder if you have rough idea about it to avoid multiple allocations.

Edit: People are pointing out that my understanding is wrong. Please ignore the answer (I rather not delete it - it will stand as a proof of my ignorance :-)

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

Related

Recent Posts