Why does gcc remove the whitespace between the preprocessing tokens?

Sample code:

#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
STR(X(Y,Y))

Invocations:

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P
"Y Y"

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"YY"

Why does GCC remove the whitespace between the preprocessing tokens?

For example, clang doesn't:

$ clang t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"Y Y"

UPD1. Somehow gcc takes into account the whitespace between , and Y:

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()" -D"Z=STR(X(Y,Y))"
"YY"

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()" -D"Z=STR(X(Y, Y))"
"Y Y"

UPD2. This:

STR(X(Y,
Y))

leads to:

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"Y Y"

However, this:

STR(X(Y
,Y))

leads to:

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"YY"

UPD3. Reported: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147.


This is a bug in GCC. C 2018 6.10.3.2 specifies behavior of the # operator. Paragraph 1 says “Each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list.” We see this in the #x of #define STR_(x) #x.

Paragraph 2 says:

If, in the replacement list, a parameter is immediately preceded by a # preprocessing token, both are replaced by a single character string literal preprocessing token that contains the spelling of the preprocessing token sequence for the corresponding argument. Each occurrence of white space between the argument’s preprocessing tokens becomes a single space character in the character string literal. White space before the first preprocessing token and after the last preprocessing token composing the argument is deleted…

The X(Y,Y) macro invocation must have resulted in the tokens Y and Y, and we see in #define X(x,y) x y that they would have white space between them.

White-space in a macro replacement list is significant, per 6.10.3 1, which says:

Two replacement lists are identical if and only if the preprocessing tokens in both have the same number, ordering, spelling, and white-space separation, where all white-space separations are considered identical.

Thus, in #define X(x,y) x y, the replacement list should not be considered to be just the two tokens x and y, with white space disregarded. The replacement list is x, white space, and y.

Further, when the macro is replaced, it is replaced by the replacement list (and hence includes white space), not merely by the tokens in the replacement list, per 6.10.3 10:

… Each subsequent instance of the function-like macro name followed by a ( as the next preprocessing token introduces the sequence of preprocessing tokens that is replaced by the replacement list in the definition (an invocation of the macro)… Within the sequence of preprocessing tokens making up an invocation of a function-like macro, new-line is considered a normal white-space character.