Why does C not allow concatenating strings when using the conditional operator?
As per the C11 standard, chapter §5.1.1.2, concatenation of adjacent string literals:
Adjacent string literal tokens are concatenated.
happens in translation phase. On the other hand:
printf("Hi" (test ? "Bye" : "Goodbye"));
involves the conditional operator, which is evaluated at run-time. So, at compile time, during the translation phase, there are no adjacent string literals present, hence the concatenation is not possible. The syntax is invalid and thus reported by your compiler.
To elaborate a bit on the why part, during the preprocessing phase, the adjacent string literals are concatenated and represented as a single string literal (token). The storage is allocated accordingly and the concatenated string literal is considered as a single entity (one string literal).
On the other hand, in case of run-time concatenation, the destination should have enough memory to hold the concatenated string literal otherwise, there will be no way that the expected concatenated output can be accessed. Now, in case of string literals, they are already allocated memory at compile-time and cannot be extended to fit in any more incoming input into or appended to the original content. In other words, there will be no way that the concatenated result can be accessed (presented) as a single string literal. So, this construct in inherently incorrect.
Just FYI, for run-time string (not literals) concatenation, we have the library function strcat()
which concatenates two strings. Notice, the description mentions:
char *strcat(char * restrict s1,const char * restrict s2);
The
strcat()
function appends a copy of the string pointed to bys2
(including the terminating null character) to the end of the string pointed to bys1
. The initial character ofs2
overwrites the null character at the end ofs1
. [...]
So, we can see, the s1
is a string, not a string literal. However, as the content of s2
is not altered in any way, it can very well be a string literal.
According to the C Standard (5.1.1.2 Translation phases)
1 The precedence among the syntax rules of translation is specified by the following phases.6)
- Adjacent string literal tokens are concatenated.
And only after that
- White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
In this construction
"Hi" (test ? "Bye" : "Goodbye")
there are no adjacent string literal tokens. So this construction is invalid.
String literal concatenation is performed by the preprocessor at compile-time. There is no way for this concatenation to be aware of the value of test
, which is not known until the program actually executes. Therefore, these string literals cannot be concatenated.
Because the general case is that you wouldn't have a construction like this for values known at compile-time, the C standard was designed to restrict the auto-concatenation feature to the most basic case: when the literals are literally right alongside each other.
But even if it did not word this restriction in that way, or if the restriction were differently-constructed, your example would still be impossible to realise without making the concatenation a runtime process. And, for that, we have the library functions such as strcat
.
Because C has no string
type. String literals are compiled to char
arrays, referenced by a char*
pointer.
C allows adjacent literals to be combined at compile-time, as in your first example. The C compiler itself has some knowledge about strings. But this information is not present at runtime, and thus concatenation cannot happen.
During the compilation process, your first example is "translated" to:
int main() {
static const char char_ptr_1[] = {'H', 'i', 'B', 'y', 'e', '\0'};
printf(char_ptr_1);
}
Note how the two strings are combined to a single static array by the compiler, before the program ever executes.
However, your second example is "translated" to something like this:
int main() {
static const char char_ptr_1[] = {'H', 'i', '\0'};
static const char char_ptr_2[] = {'B', 'y', 'e', '\0'};
static const char char_ptr_3[] = {'G', 'o', 'o', 'd', 'b', 'y', 'e', '\0'};
int test = 0;
printf(char_ptr_1 (test ? char_ptr_2 : char_ptr_3));
}
It should be clear why this does not compile. The ternary operator ?
is evaluated at runtime, not compile-time, when the "strings" no longer exist as such, but only as simple char
arrays, referenced by char*
pointers. Unlike adjacent string literals, adjacent char pointers are simply a syntax error.
If you really want to have both branches produce compile-time string constants to be chosen at runtime, you'll need a macro.
#include <stdio.h>
#define ccat(s, t, a, b) ((t)?(s a):(s b))
int
main ( int argc, char **argv){
printf("%s\n", ccat("hello ", argc > 2 , "y'all", "you"));
return 0;
}