Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?

The following code receives seg fault on line 2:

char *str = "string";
str[0] = 'z';  // could be also written as *str = 'z'
printf("%s\n", str);

While this works perfectly well:

char str[] = "string";
str[0] = 'z';
printf("%s\n", str);

Tested with MSVC and GCC.


See the C FAQ, Question 1.32

Q: What is the difference between these initializations?
char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].

A: A string literal (the formal term for a double-quoted string in C source) can be used in two slightly different ways:

  1. As the initializer for an array of char, as in the declaration of char a[] , it specifies the initial values of the characters in that array (and, if necessary, its size).
  2. Anywhere else, it turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the second declaration initializes p to point to the unnamed array's first element.

Some compilers have a switch controlling whether string literals are writable or not (for compiling old code), and some may have options to cause string literals to be formally treated as arrays of const char (for better error catching).


Normally, string literals are stored in read-only memory when the program is run. This is to prevent you from accidentally changing a string constant. In your first example, "string" is stored in read-only memory and *str points to the first character. The segfault happens when you try to change the first character to 'z'.

In the second example, the string "string" is copied by the compiler from its read-only home to the str[] array. Then changing the first character is permitted. You can check this by printing the address of each:

printf("%p", str);

Also, printing the size of str in the second example will show you that the compiler has allocated 7 bytes for it:

printf("%d", sizeof(str));

Most of these answers are correct, but just to add a little more clarity...

The "read only memory" that people are referring to is the text segment in ASM terms. It's the same place in memory where the instructions are loaded. This is read-only for obvious reasons like security. When you create a char* initialized to a string, the string data is compiled into the text segment and the program initializes the pointer to point into the text segment. So if you try to change it, kaboom. Segfault.

When written as an array, the compiler places the initialized string data in the data segment instead, which is the same place that your global variables and such live. This memory is mutable, since there are no instructions in the data segment. This time when the compiler initializes the character array (which is still just a char*) it's pointing into the data segment rather than the text segment, which you can safely alter at run-time.


Why do I get a segmentation fault when writing to a string?

C99 N1256 draft

There are two different uses of character string literals:

  1. Initialize char[]:

    char c[] = "abc";      
    

    This is "more magic", and described at 6.7.8/14 "Initialization":

    An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

    So this is just a shortcut for:

    char c[] = {'a', 'b', 'c', '\0'};
    

    Like any other regular array, c can be modified.

  2. Everywhere else: it generates an:

    • unnamed
    • array of char What is the type of string literals in C and C++?
    • with static storage
    • that gives UB if modified

    So when you write:

    char *c = "abc";
    

    This is similar to:

    /* __unnamed is magic because modifying it gives UB. */
    static char __unnamed[] = "abc";
    char *c = __unnamed;
    

    Note the implicit cast from char[] to char *, which is always legal.

    Then if you modify c[0], you also modify __unnamed, which is UB.

    This is documented at 6.4.5 "String literals":

    5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence [...]

    6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

6.7.8/32 "Initialization" gives a direct example:

EXAMPLE 8: The declaration

char s[] = "abc", t[3] = "abc";

defines "plain" char array objects s and t whose elements are initialized with character string literals.

This declaration is identical to

char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration

char *p = "abc";

defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

GCC 4.8 x86-64 ELF implementation

Program:

#include <stdio.h>

int main(void) {
    char *s = "abc";
    printf("%s\n", s);
    return 0;
}

Compile and decompile:

gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o

Output contains:

 char *s = "abc";
8:  48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
f:  00 
        c: R_X86_64_32S .rodata

Conclusion: GCC stores char* it in .rodata section, not in .text.

If we do the same for char[]:

 char s[] = "abc";

we obtain:

17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

so it gets stored in the stack (relative to %rbp).

Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:

readelf -l a.out

which contains:

 Section to Segment mapping:
  Segment Sections...
   02     .text .rodata