Are char arrays guaranteed to be null terminated?

#include <stdio.h>

int main() {
    char a = 5;
    char b[2] = "hi"; // No explicit room for `\0`.
    char c = 6;

    return 0;
}

Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character http://www.eskimo.com/~scs/cclass/notes/sx8.html

In the above example b only has room for 2 characters so the null terminating char doesn't have a spot to be placed at and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.

Is this expected or am I hitting undefined behavior?


Solution 1:

It is allowed to initialize a char array with a string if the array is at least large enough to hold all of the characters in the string besides the null terminator.

This is detailed in section 6.7.9p14 of the C standard:

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

However, this also means that you can't treat the array as a string since it's not null terminated. So as written, since you're not performing any string operations on b, your code is fine.

What you can't do is initialize with a string that's too long, i.e.:

char b[2] = "hello";

As this gives more initializers than can fit in the array and is a constraint violation. Section 6.7.9p2 states this as follows:

No initializer shall attempt to provide a value for an object not contained within the entity being initialized.

If you were to declare and initialize the array like this:

char b[] = "hi"; 

Then b would be an array of size 3, which is large enough to hold the two characters in the string constant plus the terminating null byte, making b a string.

To summarize:

If the array has a fixed size:

  • If the string constant used to initialize it is shorter than the array, the array will contain the characters in the string with successive elements set to 0, so the array will contain a string.
  • If the array is exactly large enough to contain the elements of the string but not the null terminator, the array will contain the characters in the string without the null terminator, meaning the array is not a string.
  • If the string constant (not counting the null terminator) is longer than the array, this is a constraint violation which triggers undefined behavior

If the array does not have an explicit size, the array will be sized to hold the string constant plus the terminating null byte.

Solution 2:

Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character.

Those notes are mildly misleading in this case. I shall have to update them.

When you write something like

char *p = "Hello";

or

printf("world!\n");

C automatically creates an array of characters for you, of just the right size, containing the string, terminated by the \0 character.

In the case of array initializers, however, things are slightly different. When you write

char b[2] = "hi";

the string is merely the initializer for an array which you are creating. So you have complete control over the size. There are several possibilities:

char b0[] = "hi";     // compiler infers size
char b1[1] = "hi";    // error
char b2[2] = "hi";    // No terminating 0 in the array. (Illegal in C++, BTW)
char b3[3] = "hi";    // explicit size matches string literal
char b4[10] = "hi";   // space past end of initializer is always zero-initialized

For b0, you don't specify a size, so the compiler uses the string initializer to pick the right size, which will be 3.

For b1, you specify a size, but it's too small, so the compiler should give you a error.

For b2, which is the case you asked about, you specify a size which is just barely big enough for the explicit characters in the string initializer, but not the terminating \0. This is a special case. It's legal, but what you end up with in b2 is not a proper null-terminated string. Since it's unusual at best, the compiler might give you a warning. See this question for more information on this case.

For b3, you specify a size which is just right, so you get a proper string in an exactly-sized array, just like b0.

For b4, you specify a size which is too big, although this is no problem. There ends up being extra space in the array, beyond the terminating \0. (As a matter of fact, this extra space will also be filled with \0.) This extra space would let you safely do something like strcat(b4, ", wrld!").

Needless to say, most of the time you want to use the b0 form. Counting characters is tedious and error-prone. As Brian Kernighan (one of the creators of C) has written in this context, "Let the computer do the dirty work."

One more thing. You wrote:

and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.

I don't know what's going on there, but it's safe to say that the compiler is not trying to "make room for a \0". Compilers can and often do store variables in their own inscrutable internal order, matching neither the order you declared them, nor alphabetical order, nor anything else you might think of. If under your compiler array b ended up with extra space after it which did contain a \0 as if to terminate the string, that was probably basically random chance, not because the compiler was trying to be nice to you and helping to make something like printf("%s\n", b) be better defined. (Under the two compilers where I tried it, printf("%s\n", b) printed hi^E and hi ??, clearly showing the presence of trailing random garbage, as expected.)