C / C++ MultiDimensional Array Internals

I have a question about how C / C++ internally stores multidimensional arrays declared using the notation foo[m][n]. I am not questioning pure pointers to pointers etc... I am asking because of speed reasons...

Correct me if I am wrong, but syntactically foo is an array of pointers, which themselves point to an array

int foo[5][4]
*(foo + i)           // returns a memory address
*( *(foo + i) + j)    // returns an int

I have heard from many places that the C/C++ compiler converts foo[m][n] to a one dimensional array behind the scenes (calculating the required one dimension index with i * width + j). However if this was true then the following would hold

*(foo + 1)          // should return element foo[0][1]

Thus my question: Is it true that foo[m][n] is (always?) stored in memory as a flat one dimensional array?? If so, why does the above code work as shown.

Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element foo[0][1], the following code will work:

*((int *)foo+1)=5;

The explicit cast is necessary, because foo+1, is the same as &foo[1] which is not at all the same thing as foo[0][1]. *(foo+1) is a pointer to the fifth element in the flat memory area. In other words, *(foo+1) is basically foo[1] and **(foo+1) is foo[1][0]. Here is how the memory is laid out for some of your two dimensional array:

enter image description here

A two-dimensional array:

int foo[5][4];

is nothing more or less than an array of arrays:

typedef int row[4];   /* type "row" is an array of 4 ints */
row foo[5];           /* the object "foo" is an array of 5 rows */

There are no pointer objects here, either explicit or implicit.

Arrays are not pointers. Pointers are not arrays.

What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn't apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array's first element), but there is no pointer object stored in memory.

The array object foo is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguous int elements; the whole thing is therefore stored as 20 contiguous int objects.

The indexing operator is defined in terms of pointer arithmetic; x[y] is equivalent to *(x + y). Typically the left operand is going to be either a pointer expression or an array expression; if it's an array expression, the array is implicitly converted to a pointer.

So foo[x][y] is equivalent to *(foo[x] + y), which in turn is equivalent to *(*(foo + x) + y). (Note that no casts are necessary.) Fortunately, you don't have to write it that way, and foo[x][y] is a lot easier to understand.

Note that you can create a data structure that can be accessed with the same foo[x][y] syntax, but where foo really is a pointer to pointer to int. (In that case, the prefix of each [] operator is already a pointer expression, and doesn't need to be converted.) But to do that, you'd have to declare foo as a pointer-to-pointer-to-int:

int **foo;

and then allocate and initialize all the necessary memory. This is more flexible than int foo[5][4], since you can determine the number of rows and the size (or even existence) of each row dynamically.

Section 6 of the comp.lang.c FAQ explains this very well.

EDIT:

In response to Arrakis's comment, it's important to keep in mind the distinction between type and representation.

For example, these two types:

struct pair { int x; int y;};
typedef int arr2[2];

very likely have the same representation in memory (two consecutive int objects), but the syntax to access the elements is quite different.

Similarly, the types int[5][4] and int[20] have the same memory layout (20 consecutive int objects), but the syntax to access the elements is different.

You can access foo[2][2] as ((int*)foo)[10] (treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it's useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don't do array bounds-checking. On the other hand, optimizing compilers can assume that your code's behavior is defined, and generate arbitrary code if it isn't.

C arrays - even multi-dimensional ones - are contiguous, ie an array of type int [4][5] is structurally equivalent to an array of type int [20].

However, these types are still incompatible according to C language semantics. In particular, the following code is in violation of the C standard:

int foo[4][5] = { { 0 } };
int *p = &foo[0][0];
int x = p[12]; // undefined behaviour - can't treat foo as int [20]

The reason for this is that the C standard is (probably intentionally) worded in a way which makes bounds-checking implementations possible: As p is derived from foo[0], which has type int [5], valid indices must be in range 0..5 (resp. 0..4 if you actually access the element).

Many other programming languages (Java, Perl, Python, JavaScript, ...) use jagged arrays to implement multi-dimensional arrays. This is also possible in C by using an array of pointers:

int *bar[4] = { NULL };
bar[0] = (int [3]){ 0 };
bar[1] = (int [5]){ 1, 2, 3, 4 };
int y = bar[1][2]; // y == 3

However, jagged arrays are not contiguous, and the pointed-to arrays need not be of uniform size.

Because of implicit conversion of array expressions into pointer expressions, indexing jagged and non-jagged arrays looks identical, but the actual address calculations will be quite different:

&foo[1]    == (int (*)[5])((char *)&foo + 1 * sizeof (int [5]))

&bar[1]    == (int **)((char *)&bar + 1 * sizeof (int *))

&foo[1][2] == (int *)((char *)&foo[1] + 2 * sizeof (int))
           == (int *)((char *)&foo + 1 * sizeof (int [5]) + 2 * sizeof (int))

&bar[1][2] == (int *)((char *)bar[1] + 2 * sizeof (int)) // no & before bar!
           == (int *)((char *)*(int **)((char *)&bar + 1 * sizeof (int *))
                      + 2 * sizeof (int))

C / C++ MultiDimensional Array Internals

Related

Recent Posts