Is accessing a global array outside its bound undefined behavior?

I just had an exam in my class today --- reading C code and input, and the required answer was what will appear on the screen if the program actually runs. One of the questions declared a[4][4] as a global variable and at a point of that program, it tries to access a[27][27], so I answered something like "Accessing an array outside its bounds is an undefined behavior" but the teacher said that a[27][27] will have a value of 0.

Afterwards, I tried some code to check whether "all uninitialized golbal variable is set to 0" is true or not. Well, it seems to be true.

So now my question:

  • Seems like some extra memory had been cleared and reserved for the code to run. How much memory is reserved? Why does a compiler reserve more memory than it should, and what is it for?
  • Will a[27][27] be 0 for all environment?

Edit :

In that code, a[4][4] is the only global variable declared and there are some more local ones in main().

I tried that code again in DevC++. All of them is 0. But that is not true in VSE, in which most value are 0 but some have a random value as Vyktor has pointed out.


Solution 1:

You were right: it is undefined behavior and you cannot count it always producing 0.

As for why you are seeing zero in this case: modern operating systems allocate memory to processes in relatively coarse-grained chunks called pages that are much larger than individual variables (at least 4KB on x86). When you have a single global variable, it will be located somewhere on a page. Assuming a is of type int[][] and ints are four bytes on your system, a[27][27] will be located about 500 bytes from the beginning of a. So as long as a is near the beginning of the page, accessing a[27][27] will be backed by actual memory and reading it won't cause a page fault / access violation.

Of course, you cannot count on this. If, for example, a is preceded by nearly 4KB of other global variables then a[27][27] will not be backed by memory and your process will crash when you try to read it.

Even if the process does not crash, you cannot count on getting the value 0. If you have a very simple program on a modern multi-user operating system that does nothing but allocate this variable and print that value, you probably will see 0. Operating systems set memory contents to some benign value (usually all zeros) when handing over memory to a process so that sensitive data from one process or user cannot leak to another.

However, there is no general guarantee that arbitrary memory you read will be zero. You could run your program on a platform where memory isn't initialized on allocation, and you would see whatever value happened to be there from its last use.

Also, if a is followed by enough other global variables that are initialized to non-zero values then accessing a[27][27] would show you whatever value happens to be there.

Solution 2:

Accessing an array out of bounds is undefined behavior, which means the results are unpredictable so this result of a[27][27] being 0 is not reliable at all.

clang tell you this very clearly if we use -fsanitize=undefined:

runtime error: index 27 out of bounds for type 'int [4][4]'

Once you have undefined behavior the compiler can really do anything at all, we have even seen examples where gcc has turned a finite loop into an infinite loop based on optimizations around undefined behavior. Both clang and gcc in some circumstances can generate and undefined instruction opcode if it detects undefined behavior.

Why is it undefined behavior, Why is out-of-bounds pointer arithmetic undefined behaviour? provides a good summary of reasons. For example, the resulting pointer may not be a valid address, the pointer could now point outside the assigned memory pages, you could be working with memory mapped hardware instead of RAM etc...

Most likely the segment where static variables are being stored is much larger then the array you are allocating or the segment that you are stomping though just happens to be zeroed out and so you are just lucky in this case but again completely unreliable behavior. Most likely your page size is 4k and access of a[27][27] is within that bound which is probably why you are not seeing a segmentation fault.

What the standard says

The draft C99 standard tell us this is undefined behavior in section 6.5.6 Additive operators which covers pointer arithmetic which is what an array access comes down to. It says:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.

[...]

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

and the standards definition of undefined behavior tells us that the standard imposes no requirements on the behavior and notes possible behavior is unpredictable:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, [...]

Solution 3:

Here is the quote from the standard, that specifies what is undefined behavior.

J.2 Undefined behavor

  • An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).

In your case you the array subscript is completely outside of the array. Depending that the value will be zero is completely unreliable.

Furthermore the behavior of entire program is in question.

Solution 4:

If just run your code from visual studio 2012 and got result like this (different at each run):

Address of a: 00FB8130
Address of a[4][4]: 00FB8180
Address of a[27][27]: 00FB834C
Value of a[27][27]: 0
Address of a[1000][1000]: 00FBCF50
Value of a[1000][1000]: <<< Unhandled exception at 0x00FB3D8F in GlobalArray.exe:
                            0xC0000005: Access violation reading location 0x00FBCF50.

When you look at Modules window you see that your application module memory range is 00FA0000-00FBC000. And unless you have CRT Checks turned on nothing will control what do you do inside your memory (as long as you don't violate memory protection).

So you got 0 at a[27][27] purely by chance. When you open memory view from position 00FB8130 (a) you will probably see something like this:

0x00FB8130  08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8140  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8150  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8160  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8170  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8180  01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
0x00FB8190  c0 90 45 00 b0 e9 45 00 00 00 00 00 00 00 00 00  À.E.°éE.........
0x00FB81A0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB81B0  00 00 00 00 80 5c af 0f 00 00 00 00 00 00 00 00  ....€\¯.........
0x00FB81C0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.......... 
0x00FB8330  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00FB8340  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................ <<<<
0x00FB8350  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
..........                                      ^^ ^^ ^^ ^^

It's possible that with your compiler you will always get 0 for that code because of how it uses memory, but just few bytes away you can find another variable.

For example with memory shown above a[6][0] points to address 0x00FB8190 which contains integer value of 4559040.

Solution 5:

Then get your teacher to explain this one.

I don't know if this will work on your system but playing about with blatting memory AFTER the array a with non-zero'd bytes gives a different result for a[27][27].

On my system, when I printed contents of a[27][27] it was 0xFFFFFFFF. ie -1 converted to unsigned is all bits set in twos complement.

#include <stdio.h>
#include <string.h>

#define printer(expr) { printf(#expr" = %u\n", expr); }

   unsigned int d[8096];
   int a[4][4];  /* assuming an int is 4 bytes, next 4 x 4 x 4 bytes will be initialised to zero */
   unsigned int b[8096];
   unsigned int c[8096];


int main() {

   /* make sure next bytes do not contain zero'd bytes */
   memset(b, -1, 8096*4);
   memset(c, -1, 8096*4);
   memset(d, -1, 8096*4);

   /* lets check normal access */
   printer(a[0][0]);
   printer(a[3][3]);

   /* Now we disrepect the machine - undefined behaviour shall result */
   printer(a[27][27]);

   return 0;
}

This is my output:

a[0][0] = 0
a[3][3] = 0
a[27][27] = 4294967295

I saw in comments about viewing memory in Visual Studio. Easiest way is to add a break-point somewhere in your code (to halt execution) then go into Debug... windows... Memory menu, select eg Memory 1. You then find the memory address of your array a. In my case address was 0x0130EFC0. so you enter 0x0130EFC0 in the address fiend and press Enter. This shows the memory at that location.

Eg in my case.

0x0130EFC0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ..................................
0x0130EFE2  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff  ..............................ÿÿÿÿ
0x0130F004  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ 
0x0130F026  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x0130F048  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

The zeros are of the course the array a, which has a byte size of 4 x 4 x sizeof an int (4 in my case) = 64 bytes. The bytes from address 0x0130EFC0 are 0xFF each (from b,c, or d contents).

Note that:

0x130EFC0 + 64 = 0x130EFC0 + 0x40 = 130F000

which is that the start of all those ff bytes you see. Probably array b.