What is the difference between char a[] = ?string?; and char *p = ?string?;?
As the heading says, What is the difference between
char a[] = ?string?; and
char *p = ?string?;
This question was asked to me in interview. I even dont understand the statement.
char a[] = ?string?
Here what is ?
operator? Is it a part of a string or it has some specific meaning?
The ?
seems to be a typo, it is not semantically valid. So the answer assumes the ?
is a typo and explains what probably the interviewer actually meant to ask.
Both are distinctly different, for a start:
- The first creates a pointer.
- The second creates an array.
Read on for more detailed explanation:
The Array version:
char a[] = "string";
Creates an array that is large enough to hold the string literal "string", including its NULL
terminator. The array string
is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof
operator can be used to determine its size.
The pointer version:
char *p = "string";
Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates[Ref 1] use of string literal without the const
keyword. So the declaration should be:
const char *p = "string";
Also,you need to use the strlen()
function, and not sizeof
to find size of the string since the sizeof
operator will just give you the size of the pointer variable.
Which version is better and which one shall I use?
Depends on the Usage.
- If you do not need to make any changes to the string, use the pointer version.
- If you intend to change the data, use the array version.
Note: This is a not C++ but this is C specific.
Note that, use of string literal without the const
keyword is perfectly valid in C.
However, modifying a string literal is still an Undefined Behavior in C[Ref 2].
This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
The first one is array the other is pointer.
The array declaration
char a[6];
requests that space for six characters be set aside, to be known by the namea
. That is, there is a location nameda
at which six characters can sit. The pointer declarationchar *p;
on the other hand, requests a place which holds a pointer. The pointer is to be known by the namep
, and can point to any char (or contiguous array of chars) anywhere.The statements
char a[] = "string"; char *p = "string";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+----+ a: | s | t | r | i | n | g | \0 | +---+---+---+---+---+---+----+ +-----+ +---+---+---+---+---+---+---+ p: | *======> | s | t | r | i | n | g |\0 | +-----+ +---+---+---+---+---+---+---+
It is important to realize that a reference like
x[3]
generates different code depending on whetherx
is an array or a pointer. Given the declarations above, when the compiler sees the expressiona[3]
, it emits code to start at the locationa
, move three elements past it, and fetch the character there. When it sees the expressionp[3]
, it emits code to start at the locationp
, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, botha[3]
andp[3]
happen to be the characterl
, but the compiler gets there differently.
Source: comp.lang.c FAQ list · Question 6.2
char a[] = "string";
This allocates the string on the stack.
char *p = "string";
This creates a pointer on the stack that points to the literal in the data segment of the process.
?
is whoever wrote it not knowing what they were doing.
Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc
and calloc
will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.
In these 4 segements, text segment is the READ ONLY
segment and in the all the other three is for READ
and WRITE
.
char a[] = "string";
- This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g
) plus NULL character (\0
) at the end.
char *p = "string";
- This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string"
. This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p
just points to that string.
Now a[0]
(index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'
. This operation is allowed because we have READ WRITE
access in stack.
But p[0] = 'x'
will leads to crash, because we have only READ
access to text segement. Segmentation fault will happen if we do any write on text segment.
But you can change the value of variable p
, because its local variable in stack. like below
char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);
This is allowed. Here we are changing the address stored in the pointer variable p
to address of the string start
(again start
is also a read only data in text segement). If you want to modify values present in *p
means go for dynamically allocated memory.
char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");
Now p[0] = 'x'
operation is allowed, because now we are writing in heap.