Why does a C/C++ compiler need know the size of an array at compile time?

c++ c

I know C standards preceding C99 (as well as C++) says that the size of an array on stack must be known at compile time. But why is that? The array on stack is allocated at run-time. So why does the size matter in compile time? Hope someone explain to me what a compiler will do with size at compile time. Thanks.

The example of such an array is:

void func()
{
    /*Here "array" is a local variable on stack, its space is allocated
     *at run-time. Why does the compiler need know its size at compile-time?
     */
   int array[10]; 
}

To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.

Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.

When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.

This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry¹. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.

Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:

void foo(void)
{
    int a;
    char b[10];
    int c;

a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.

However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.

^{1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.}

It is not an extremely complicated thing to support, so the reason C89 doesn't allow this is not because it was impossible back then.

There are however two important reasons why it is not in C89:

The runtime code will get less efficient if the array size is not known at compile-time.
Supporting this makes life harder for compiler writers.

Historically, it has been very important that C compilers should be (relatively) easy to write. Also, it should be possible to make compilers simple and small enough to run on modest computer systems (by 80s standards). Another important feature of C is that the generated code should consistently be extremely efficient, without any surprises,

I think it is unfortunate that these values no longer hold for C99.

The compiler has to generate the code to create the space for the frame on the stack to hold the array and other local local variables. For this it needs the size of the array.

In C++ this becomes even more difficult to implement, because variables stored on the stack must have their destructors called in the event of an exception, or upon returning from a given function or scope. Keeping track of the exact number/size of variables to be destroyed adds additional overhead and complexity. While in C it's possible to use something like a frame pointer to make freeing of the VLA implicit, in C++ that doesn't help you, because those destructors need to be called.

Also, VLAs can cause Denial of Service security vulnerabilities. If the user is able to supply any value which is eventually used as the size for a VLA, then they could use a sufficiently large value to cause a stack overflow (and therefore failure) in your process.

Finally, C++ already has a safe and effective variable length array (std::vector<t>), so there's little reason to implement this feature for C++ code.

Depends on how you allocate the array.

If you create it as a local variable, and specify a length, then it matters because the compiler needs to know how much space to allocate on the stack for the elements of the array. If you don't specify a size of the array, then it doesn't know how much space to set aside for the array elements.

If you create just a pointer to an array, then all you need to do is allocate the space for the pointer itself, and then you can dynamically create array elements during run time. But in this form of array creation, you're allocating space for the array elements in the heap, not on the stack.

Why does a C/C++ compiler need know the size of an array at compile time?

Related

Recent Posts