C optimisation of string literals
I've just been inspecting the following in gdb:
char *a[] = {"one","two","three","four"};
char *b[] = {"one","two","three","four"};
char *c[] = {"two","three","four","five"};
char *d[] = {"one","three","four","six"};
...and I get the following:
(gdb) p a
$17 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p b
$18 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p c
$19 = {0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four", 0x80961b7 "five"}
(gdb) p d
$20 = {0x80961a4 "one", 0x80961ac "three", 0x80961b2 "four", 0x80961bc "six"}
I'm really surprised that the string pointers are the same for equivalent words. I would have thought each string would have been allocated its own memory on the stack regardless of whether it was the same as a string in another array.
Is this an example of some sort of compiler optimisation or is it standard behaviour for string declaration of this kind?
Solution 1:
It's called "string pooling". It's optional in Microsoft Compilers, but not in GCC. If you switch off string pooling in MSVC, then the "same" strings in the different arrays would be duplicated, and have different memory addresses, and so would take up an extra (unnecessary) 50 or so bytes of your static data.
EDIT: gcc prior to v 4.0 had an option, -fwritable-strings
which disabled string pooling. The effect of this option was twofold: It allowed string literals to be overwritten, and disabled string pooling. So, in your code, setting this flag would allow the somewhat dangerous code
/* Overwrite the first string in a, so that it reads 'xne'. Does not */
/* affect the instances of the string "one" in b or d */
*a[0] = 'x';
Solution 2:
(I assume that your a
, b
, c
and d
are declared as local variables, which is the reason for your stack-related expectations.)
String literals in C have static storage duration. They are never allocated "on the stack". They are always allocated in global/static memory and live "forever", i.e. as long as the program runs.
Your a
, b
, c
and d
arrays were allocated on the stack. The pointers stored in these arrays point to static memory. Under these circumstances, there's nothing unusual about pointers for identical words being identical.
Whether a compiler will merge identical literals into one depends on the compiler. Some compilers even have an option that controls this behavior. String literals are always read-only (which is why it is a better idea to use const char *
type for your arrays), so it doesn't make much difference whether they are merged or not, until you begin to rely on actual pointer values.
P.S. Just out of curiosity: even if these string literals were allocated on the stack, why would you expect identical literals to be "instantiated" more than once?