How are multi-dimensional arrays formatted in memory?

A static two-dimensional array looks like an array of arrays - it's just laid out contiguously in memory. Arrays are not the same thing as pointers, but because you can often use them pretty much interchangeably it can get confusing sometimes. The compiler keeps track properly, though, which makes everything line up nicely. You do have to be careful with static 2D arrays like you mention, since if you try to pass one to a function taking an int ** parameter, bad things are going to happen. Here's a quick example:

int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}};

In memory looks like this:

0 1 2 3 4 5

exactly the same as:

int array2[6] = { 0, 1, 2, 3, 4, 5 };

But if you try to pass array1 to this function:

void function1(int **a);

you'll get a warning (and the app will fail to access the array correctly):

warning: passing argument 1 of ‘function1’ from incompatible pointer type

Because a 2D array is not the same as int **. The automatic decaying of an array into a pointer only goes "one level deep" so to speak. You need to declare the function as:

void function2(int a[][2]);

void function2(int a[3][2]);

To make everything happy.

This same concept extends to n-dimensional arrays. Taking advantage of this kind of funny business in your application generally only makes it harder to understand, though. So be careful out there.

The answer is based on the idea that C doesn't really have 2D arrays - it has arrays-of-arrays. When you declare this:

int someNumbers[4][2];

You are asking for someNumbers to be an array of 4 elements, where each element of that array is of type int [2] (which is itself an array of 2 ints).

The other part of the puzzle is that arrays are always laid out contiguously in memory. If you ask for:

sometype_t array[4];

then that will always look like this:

| sometype_t | sometype_t | sometype_t | sometype_t |

(4 sometype_t objects laid out next to each other, with no spaces in between). So in your someNumbers array-of-arrays, it'll look like this:

| int [2]    | int [2]    | int [2]    | int [2]    |

And each int [2] element is itself an array, that looks like this:

| int        | int        |

So overall, you get this:

| int | int  | int | int  | int | int  | int | int  |

unsigned char MultiArray[5][2]={{0,1},{2,3},{4,5},{6,7},{8,9}};

in memory is equal to:

unsigned char SingleArray[10]={0,1,2,3,4,5,6,7,8,9};

In answer to your also: Both, though the compiler is doing most of the heavy lifting.

In the case of statically allocated arrays, "The System" will be the compiler. It will reserve the memory like it would for any stack variable.

In the case of the malloc'd array, "The System" will be the implementer of malloc (the kernel usually). All the compiler will allocate is the base pointer.

The compiler is always going to handle the type as what they are declared to be except in the example Carl gave where it can figure out interchangeable usage. This is why if you pass in a [][] to a function it must assume that it is a statically allocated flat, where ** is assumed to be pointer to pointer.

Suppose, we have a1 and a2 defined and initialized like below (c99):

int a1[2][2] = {{142,143}, {144,145}};
int **a2 = (int* []){ (int []){242,243}, (int []){244,245} };

a1 is a homogeneous 2D array with plain continuous layout in memory and expression (int*)a1 is evaluated to a pointer to its first element:

a1 --> 142 143 144 145

a2 is initialized from a heterogeneous 2D array and is a pointer to a value of type int*, i.e. dereference expression *a2 evaluates into a value of type int*, memory layout does not have to be continuous:

a2 --> p1 p2
       ...
p1 --> 242 243
       ...
p2 --> 244 245

Despite totally different memory layout and access semantics, C-language grammar for array-access expressions looks exactly the same for both homogeneous and heterogeneous 2D array:

expression a1[1][0] will fetch value 144 out of a1 array
expression a2[1][0] will fetch value 244 out of a2 array

Compiler knows that the access-expression for a1 operates on type int[2][2], when the access-expression for a2 operates on type int**. The generated assembly code will follow the homogeneous or heterogeneous access semantics.

The code usually crashes at run-time when array of type int[N][M] is type-casted and then accessed as type int**, for example:

((int**)a1)[1][0]   //crash on dereference of a value of type 'int'

How are multi-dimensional arrays formatted in memory?

Related

Recent Posts