String literals: Where do they go?

I am interested in where string literals get allocated/stored.

I did find one intriguing answer here, saying:

Defining a string inline actually embeds the data in the program itself and cannot be changed (some compilers allow this by a smart trick, don't bother).

But, it had to do with C++, not to mention that it says not to bother.

I am bothering. =D

So my question is where and how is my string literal kept? Why should I not try to alter it? Does the implementation vary by platform? Does anyone care to elaborate on the "smart trick?"


A common technique is for string literals to be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you can't change it).

It does vary by platform. For example, simpler chip architectures may not support read-only memory segments so the data segment will be writable.

Rather than try to figure out a trick to make string literals changeable (it will be highly dependent on your platform and could change over time), just use arrays:

char foo[] = "...";

The compiler will arrange for the array to get initialized from the literal and you can modify the array.


Why should I not try to alter it?

Because it is undefined behavior. Quote from C99 N1256 draft 6.7.8/32 "Initialization":

EXAMPLE 8: The declaration

char s[] = "abc", t[3] = "abc";

defines "plain" char array objects s and t whose elements are initialized with character string literals.

This declaration is identical to

char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration

char *p = "abc";

defines p with type "pointer to char" and initializes it to point to an object with type "array of const char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

Where do they go?

GCC 4.8 x86-64 ELF Ubuntu 14.04:

  • char s[]: stack
  • char *s:
    • .rodata section of the object file
    • the same segment where the .text section of the object file gets dumped, which has Read and Exec permissions, but not Write

Program:

#include <stdio.h>

int main() {
    char *s = "abc";
    printf("%s\n", s);
    return 0;
}

Compile and decompile:

gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o

Output contains:

 char *s = "abc";
8:  48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
f:  00 
        c: R_X86_64_32S .rodata

So the string is stored in the .rodata section.

Then:

readelf -l a.out

Contains (simplified):

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000704 0x0000000000000704  R E    200000

 Section to Segment mapping:
  Segment Sections...
   02     .text .rodata

This means that the default linker script dumps both .text and .rodata into a segment that can be executed but not modified (Flags = R E). Attempting to modify such a segment leads to a segfault in Linux.

If we do the same for char[]:

 char s[] = "abc";

we obtain:

17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

so it gets stored in the stack (relative to %rbp), and we can of course modify it.