Why are multi-dimensional arrays in .NET slower than normal arrays?
Edit: I apologize everybody. I used the term "jagged array" when I actually meant to say "multi-dimensional array" (as can be seen in my example below). I apologize for using the incorrect name. I actually found jagged arrays to be faster than multi-dimensional ones! I have added my measurements for jagged arrays.
I was trying to use a jagged multi-dimensional array today, when I noticed that it's performance is not as I would have expected. Using a single-dimensional array and manually calculating indices was much faster (almost two times) than using a 2D array. I wrote a test using 1024*1024
arrays (initialized to random values), for 1000 iterations, and I got the following results on my machine:
sum(double[], int): 2738 ms (100%)
sum(double[,]): 5019 ms (183%)
sum(double[][]): 2540 ms ( 93%)
This is my test code:
public static double sum(double[] d, int l1) {
// assuming the array is rectangular
double sum = 0;
int l2 = d.Length / l1;
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
sum += d[i * l2 + j];
return sum;
}
public static double sum(double[,] d) {
double sum = 0;
int l1 = d.GetLength(0);
int l2 = d.GetLength(1);
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
sum += d[i, j];
return sum;
}
public static double sum(double[][] d) {
double sum = 0;
for (int i = 0; i < d.Length; ++i)
for (int j = 0; j < d[i].Length; ++j)
sum += d[i][j];
return sum;
}
public static void Main() {
Random random = new Random();
const int l1 = 1024, l2 = 1024;
double[ ] d1 = new double[l1 * l2];
double[,] d2 = new double[l1 , l2];
double[][] d3 = new double[l1][];
for (int i = 0; i < l1; ++i) {
d3[i] = new double[l2];
for (int j = 0; j < l2; ++j)
d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
}
//
const int iterations = 1000;
TestTime(sum, d1, l1, iterations);
TestTime(sum, d2, iterations);
TestTime(sum, d3, iterations);
}
Further investigation showed that the IL for the second method is 23% larger than that of the first method. (Code size 68 vs 52.) This is mostly due to calls to System.Array::GetLength(int)
. The compiler also emits calls to Array::Get
for the jagged multi-dimensional array, whereas it simply calls ldelem
for the simple array.
So I am wondering, why is access through multi-dimensional arrays slower than normal arrays? I would have assumed the compiler (or JIT) would do something similar to what I did in my first method, but this was not actually the case.
Could you plese help me understand why this is happening the way it is?
Update: Following Henk Holterman's suggestion, here is the implementation of TestTime
:
public static void TestTime<T, TR>(Func<T, TR> action, T obj,
int iterations)
{
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
action(obj);
Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}
public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1,
T2 obj2, int iterations)
{
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
action(obj1, obj2);
Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}
Single dimensional arrays with a lower bound of 0 are a different type to either multi-dimensional or non-0 lower bound arrays within IL (vector
vs array
IIRC). vector
is simpler to work with - to get to element x, you just do pointer + size * x
. For an array
, you have to do pointer + size * (x-lower bound)
for a single dimensional array, and yet more arithmetic for each dimension you add.
Basically the CLR is optimised for the vastly more common case.
Array bounds checking?
The single-dimension array has a length member that you access directly - when compiled this is just a memory read.
The multidimensional array requires a GetLength(int dimension) method call that processes the argument to get the relevant length for that dimension. That doesn't compile down to a memory read, so you get a method call, etc.
In addition that GetLength(int dimension) will do a bounds check on the parameter.
Interestly, I ran the following code from above using VS2008 NET3.5SP1 Win32 on a Vista box, and in release/optimize the difference was barely measurable, while debug/noopt the multi-dim arrays were much slower. (I ran the three tests twice to reduce JIT affects on the second set.)
Here are my numbers:
sum took 00:00:04.3356535
sum took 00:00:04.1957663
sum took 00:00:04.5523050
sum took 00:00:04.0183060
sum took 00:00:04.1785843
sum took 00:00:04.4933085
Look at the second set of three numbers. The difference is not enough for me to code everything in single dimension arrays.
Although I haven't posted them, in Debug/unoptimized the multidimension vs. single/jagged does make a huge difference.
Full program:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace single_dimension_vs_multidimension
{
class Program
{
public static double sum(double[] d, int l1) { // assuming the array is rectangular
double sum = 0;
int l2 = d.Length / l1;
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
sum += d[i * l2 + j];
return sum;
}
public static double sum(double[,] d)
{
double sum = 0;
int l1 = d.GetLength(0);
int l2 = d.GetLength(1);
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
sum += d[i, j];
return sum;
}
public static double sum(double[][] d)
{
double sum = 0;
for (int i = 0; i < d.Length; ++i)
for (int j = 0; j < d[i].Length; ++j)
sum += d[i][j];
return sum;
}
public static void TestTime<T, TR>(Func<T, TR> action, T obj, int iterations)
{
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
action(obj);
Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}
public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1, T2 obj2, int iterations)
{
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; ++i)
action(obj1, obj2);
Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}
public static void Main() {
Random random = new Random();
const int l1 = 1024, l2 = 1024;
double[ ] d1 = new double[l1 * l2];
double[,] d2 = new double[l1 , l2];
double[][] d3 = new double[l1][];
for (int i = 0; i < l1; ++i)
{
d3[i] = new double[l2];
for (int j = 0; j < l2; ++j)
d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
}
const int iterations = 1000;
TestTime<double[], int, double>(sum, d1, l1, iterations);
TestTime<double[,], double>(sum, d2, iterations);
TestTime<double[][], double>(sum, d3, iterations);
TestTime<double[], int, double>(sum, d1, l1, iterations);
TestTime<double[,], double>(sum, d2, iterations);
TestTime<double[][], double>(sum, d3, iterations);
}
}
}
Because a multidimensional array is just a syntactic sugar as it is really just a flat array with some index calculation magic. On the other hand, a jagged array is like, an array of arrays. With a two-dimensional array, accessing an element requires reading the memory just once, while with a two level jagged array, you need to read the memory twice.
EDIT: Apparently the original poster mixed up "jagged arrays" with "multi-dimensional arrays" so my reasoning doesn't exactly stand. For the real reason, check Jon Skeet's heavy artillery answer above.