String literals: Where do they go?
I am interested in where string literals get allocated/stored.
I did find one intriguing answer here, saying:
Defining a string inline actually embeds the data in the program itself and cannot be changed (some compilers allow this by a smart trick, don't bother).
But, it had to do with C++, not to mention that it says not to bother.
I am bothering. =D
So my question is where and how is my string literal kept? Why should I not try to alter it? Does the implementation vary by platform? Does anyone care to elaborate on the "smart trick?"
A common technique is for string literals to be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you can't change it).
It does vary by platform. For example, simpler chip architectures may not support read-only memory segments so the data segment will be writable.
Rather than try to figure out a trick to make string literals changeable (it will be highly dependent on your platform and could change over time), just use arrays:
char foo[] = "...";
The compiler will arrange for the array to get initialized from the literal and you can modify the array.
Why should I not try to alter it?
Because it is undefined behavior. Quote from C99 N1256 draft 6.7.8/32 "Initialization":
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects
s
andt
whose elements are initialized with character string literals.This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines
p
with type "pointer to char" and initializes it to point to an object with type "array of const char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to usep
to modify the contents of the array, the behavior is undefined.
Where do they go?
GCC 4.8 x86-64 ELF Ubuntu 14.04:
-
char s[]
: stack -
char *s
:-
.rodata
section of the object file - the same segment where the
.text
section of the object file gets dumped, which has Read and Exec permissions, but not Write
-
Program:
#include <stdio.h>
int main() {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
So the string is stored in the .rodata
section.
Then:
readelf -l a.out
Contains (simplified):
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000704 0x0000000000000704 R E 200000
Section to Segment mapping:
Segment Sections...
02 .text .rodata
This means that the default linker script dumps both .text
and .rodata
into a segment that can be executed but not modified (Flags = R E
). Attempting to modify such a segment leads to a segfault in Linux.
If we do the same for char[]
:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp
), and we can of course modify it.