How can I work with dynamically-allocated arbitrary-dimensional arrays?

The typical 1-D array can be statically or automatically allocated in a declaration.

enum { n=100 };
int arr1[n];

Or dynamically-allocated and accessed via a pointer.

int *arr1m=malloc(n*sizeof*arr1m);
int *arr1c=calloc(n, sizeof*arr1c);

Both of these styles access an element with the same syntax.

int i = n/2;
arr1[i] = arr1c[i] = arr1m[i] = 42;

But when you add a second dimension, it takes some effort to achieve the same syntax.

int arr2[n][n];
int *arr2c=calloc(n*n,sizeof*arr2c);
arr2[5][5] = arr2c[5*n+5] = 23;

You can only get the doubled set of brackets if instead you construct it as an Iliffe-vector.

int **arr2l=calloc(n,sizeof*arr2l);
for (int j=0; j<n; j++)
    arr2l[j]=calloc(n,sizeof**arr2l);
 arr2[6][6] = arr2l[6][6] = 72;

But this becomes more and more cumbersome as the dimensions increase.

Another difficulty is checking the bounds of a dynamic array before accessing an element (so that you're not touching memory that was not properly allocated). The real arrays can use the sizeof operator to determine the bounds, but none of these dynamic arrays carry their size with them.

How can I define a structure that has fast, contiguous layout like an array, but with a consistent syntax for accessing elements with a list of indices that works the same for a 2D array as for a 3D array; and all dynamically, with sizes available dynamically so they can be passed to and returned from functions?


Solution 1:

There is no need to reinvent the wheel, C has that since C99, it is called variable length array, VLA. It has just the syntax as "normal" d-dimensional arrays, only that the bounds may be variable and they are not allowed in file scope.

As such objects can become relatively large, you should not allocate them on the stack, but with something like malloc

double (*A)[n][m] = malloc(sizeof(double[k][n][m]));

The compiler then helps you with all the index calculations without problems. If you want to pass such animals to functions you just have to be careful to declare the bounds first:

void func(size_t k, size_t n, size_t m, double A[k][n][m]);

This makes your intent clear to both, the human reader and the compiler. I prefer this over the equivalent form

void func(size_t k, size_t n, size_t m, double (*A)[n][m]);

Solution 2:

If you define a as a pointer to an array of n integers, the compiler will do the index arithmetic.

#define N 7
int (*a)[N];

int main() {
  a = malloc(N*N*sizeof(int));
  a[2][3] = 0;
}

ADDED:

Similarly, a three dimensional example:

#include <stdio.h>
#include <stdlib.h>

#define N 7

int (*a)[N][N];

int main() {
    int i,j,k;
    a = malloc(N*N*N*sizeof(int));
    for(i=0; i<N; i++) {
        for(j=0;j<N;j++) {
            for(k=0;k<N;k++) {
                a[i][j][k] = (i*10+j)*10+k;
            }
        }
    }
    for(i=0; i<N; i++) {
        for(j=0;j<N;j++) {
            for(k=0;k<N;k++) {
                printf("%2d ", a[i][j][k]);
            }
            printf("\n");
        }
        printf("\n");
    }
}

Solution 3:

There is a data structure used in the implementation of the J language (a dialect of APL) which accommodates dynamically-allocated arbitrary-dimensional arrays. It uses a hybrid of a struct and a dynamic array for its data structure, a trick commonly known as the struct hack. (More info on the J implementation here and here.)

To see the idea in a simple context, consider the 1D case: we want a dynamic 1D array which carries its size along with it. So:

struct vec { int n; int p[]; };

Since the p member is last, and C has no built-in bounds checking, it can be used to access additional memory off the end of the struct. Of course, when allocating, we'll need to provide this extra memory and not simply allocate the size of the struct. The struct is just the header of the array. C90 requires a number (say, 1) for the length of the p[] array, but C99 allows the number to be omitted so the size of the header is more straightforward to calculate.

So, an array with more dimensions will need more values to hold the sizes of each dimension. And for our structure to accommodate arrays of different dimensionality, this dimension vector will need to be variable-length as well.

What we can do to achieve all of this is to apply the struct hack twice, recursively upon itself. This gives us a memory layout like this, where R is the number of dimensions which we'll call the rank of the array, the D values are the lengths of each dimension, and the V values are the actual array data:

  1   R                    Product(D) 
 --- -------------------- ----------------------------- 
  R  D[0] D[1] ... D[R-1] V[0] V[1] ... V[Product(D)-1] 

And to describe this in C,

typedef struct arr { int r; int d[]; } *arr;

The elements of an array a immediately follow the R elements of the dims vector D. So the V elements can be accessed at a->d[r+0], a->d[r+1], ... a->d[r+i] (after reducing the index vector down to a single index on the flattened representation). The elements are easiest to treat in row-major order. The number of actual elements is the product of all the dimensions, all the dimensions multiplied together. Edit: The expression here might be better written: (a->d+a->r)[0], (a->d+a->r)[1], ... (a->d+a->r)[i].

In order to allocate one of these things, we'll need a function to compute this product as part of the size calculation.

int productdims(int rank, int *dims){
    int z=1;
    for(int i=0; i<rank; i++)
        z *= dims[i];
    return z;
}

And to initialize, we just need to fill-in the members.

arr makearr(int rank, int *dims){
    arr z = calloc( (sizeof(struct arr)/sizeof(int)) + 
                rank + productdims(rank,dims), sizeof(int));
    z->r = rank;
    memmove(z->d,dims,rank*sizeof(int));
    return z;
}

Remembering the formula for accessing 2D data (say an array of [m][n] elements) with a single index (it's a typical dynamic array like in the question). Element [i][j] is at i×n&plus;j. With a 3D array [m][n][o], element [i][j][k] is at i×(n×o)&plus;j×o&plus;k.

So we can calculate a single index for our linearly-laid-out data from an array of indices and the array of dimensions.

int *elem(arr a, ...){
    va_list ap;
    int idx = 0;

    va_start(ap,a);
    if (a->r){
        idx = va_arg(ap,int);
        for(int i=1; i<a->r; i++){
            idx *= a->d[i];
            idx += va_arg(ap,int);
        }
    }
    va_end(ap);

    return &a->d[a->r + idx];
}

Don't forget the headers:

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

And voilà:

int main() {    

    {
        int i,n=6;
        arr m = makearr(1, (int[]){n});
        for (i=0;i<n;i++)
            *elem(m,i) = i;
        for (i=0;i<n;i++,printf(" "))
            printf("%d",*elem(m,i));
    }
    puts("\n");

    {
        int i,j,n=4;
        arr m = makearr(2, (int[]){n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                *elem(m,i,j) = i*n+j;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf(" "))
                printf("%d",*elem(m,i,j));
    }
    puts("\n");

    {
        int i,j,k,n=3;
        arr m = makearr(3, (int[]){n,n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                for (k=0;k<n;k++)
                    *elem(m,i,j,k) = (i*n+j)*n+k;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf("\n"))
                for (k=0;k<n;k++,printf(" "))
                    printf("%d",*elem(m,i,j,k));
    }

    return 0;
}

Output:

0 1 2 3 4 5 

0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 


0 1 2 
3 4 5 
6 7 8 

9 10 11 
12 13 14 
15 16 17 

18 19 20 
21 22 23 
24 25 26 

Further elaboration of this code to support 2D matrix multiplication and column slices has been posted to comp.lang.c in this thread.

Better-motivated question/more powerful data structure.