String.Join vs. StringBuilder: which is faster?
In a previous question about formatting a double[][]
to CSV format, it was suggested that using StringBuilder
would be faster than String.Join
. Is this true?
Solution 1:
Short answer: it depends.
Long answer: if you already have an array of strings to concatenate together (with a delimiter), String.Join
is the fastest way of doing it.
String.Join
can look through all of the strings to work out the exact length it needs, then go again and copy all the data. This means there will be no extra copying involved. The only downside is that it has to go through the strings twice, which means potentially blowing the memory cache more times than necessary.
If you don't have the strings as an array beforehand, it's probably faster to use StringBuilder
- but there will be situations where it isn't. If using a StringBuilder
means doing lots and lots of copies, then building an array and then calling String.Join
may well be faster.
EDIT: This is in terms of a single call to String.Join
vs a bunch of calls to StringBuilder.Append
. In the original question, we had two different levels of String.Join
calls, so each of the nested calls would have created an intermediate string. In other words, it's even more complex and harder to guess about. I would be surprised to see either way "win" significantly (in complexity terms) with typical data.
EDIT: When I'm at home, I'll write up a benchmark which is as painful as possibly for StringBuilder
. Basically if you have an array where each element is about twice the size of the previous one, and you get it just right, you should be able to force a copy for every append (of elements, not of the delimiter, although that needs to be taken into account too). At that point it's nearly as bad as simple string concatenation - but String.Join
will have no problems.
Solution 2:
Here's my test rig, using int[][]
for simplicity; results first:
Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000
(update for double
results:)
Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000
(update re 2048 * 64 * 150)
Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600
and with OptimizeForTesting enabled:
Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600
So faster, but not massively so; rig (run at console, in release mode, etc):
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
static void Collect()
{
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
}
static void Main(string[] args)
{
const int ROWS = 500, COLS = 20, LOOPS = 2000;
int[][] data = new int[ROWS][];
Random rand = new Random(123456);
for (int row = 0; row < ROWS; row++)
{
int[] cells = new int[COLS];
for (int col = 0; col < COLS; col++)
{
cells[col] = rand.Next();
}
data[row] = cells;
}
Collect();
int chksum = 0;
Stopwatch watch = Stopwatch.StartNew();
for (int i = 0; i < LOOPS; i++)
{
chksum += Join(data).Length;
}
watch.Stop();
Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);
Collect();
chksum = 0;
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOPS; i++)
{
chksum += OneBuilder(data).Length;
}
watch.Stop();
Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);
Console.WriteLine("done");
Console.ReadLine();
}
public static string Join(int[][] array)
{
return String.Join(Environment.NewLine,
Array.ConvertAll(array,
row => String.Join(",",
Array.ConvertAll(row, x => x.ToString()))));
}
public static string OneBuilder(IEnumerable<int[]> source)
{
StringBuilder sb = new StringBuilder();
bool firstRow = true;
foreach (var row in source)
{
if (firstRow)
{
firstRow = false;
}
else
{
sb.AppendLine();
}
if (row.Length > 0)
{
sb.Append(row[0]);
for (int i = 1; i < row.Length; i++)
{
sb.Append(',').Append(row[i]);
}
}
}
return sb.ToString();
}
}
}