String literals vs array of char when initializing a pointer

Inspired by this question.

We can initialize a char pointer by a string literal:

char *p = "ab";

And it is perfectly fine. One could think that it is equivalent to the following:

char *p = {'a', 'b', '\0'};

But apparently it is not the case. And not only because the string literals are stored in a read-only memory, but it appears that even through the string literal has a type of char array, and the initializer {...} has the type of char array, two declarations are handled differently, as the compiler is giving the warning:

warning: excess elements in scalar initializer

in the second case. What is the explanation of such a behavior?

Update:

Moreover, in the latter case the pointer p will have the value of 0x61 (the value of the first array element 'a') instead of a memory location, such that the compiler, as warned, taking just the first element of the initializer and assigning it to p.


Solution 1:

I think you're confused because char *p = "ab"; and char p[] = "ab"; have similar semantics, but different meanings.

I believe that the latter case (char p[] = "ab";) is best regarded as a short-hand notation for char p[] = {'a', 'b', '\0'}; (initializes an array with the size determined by the initializer). Actually, in this case, you could say "ab" is not really used as a string literal.

However, the former case (char *p = "ab";) is different in that it simply initializes the pointer p to point to the first element of the read-only string literal "ab".

I hope you see the difference. While char p[] = "ab"; is representable as an initialization such as you described, char *p = "ab"; is not, as pointers are, well, not arrays, and initializing them with an array initializer does something entirely different (namely give them the value of the first element, 0x61 in your case).

Long story short, C compilers only "replace" a string literal with a char array initializer if it is suitable to do so, i.e. it is being used to initialize a char array.

Solution 2:

The second example is syntactically incorrect. In C, {'a', 'b', '\0'} can be used to initialize an array, but not a pointer.

Instead, you can use a C99 compound literal (also available in some compilers as extension, e.g, GCC) like this:

char *p = (char []){'a', 'b', '\0'};

Note that it's more powerful as the initializer isn't necessarily null-terminated.