π (and other Unicode characters) in identifiers not allowed by g++
I am π to find that I cannot use π as a valid identifier with g++ 4.7, even with the -fextended-identifiers
option enabled:
int main(int argc, const char* argv[])
{
const char* π = "I'm very happy";
return 0;
}
main.cpp:3:3: error: stray β\360β in program
main.cpp:3:3: error: stray β\237β in program
main.cpp:3:3: error: stray β\230β in program
main.cpp:3:3: error: stray β\203β in program
After some googling, I discovered that UTF-8 characters are not yet supported in identifiers, but a universal-character-name should work. So I convert my source to:
int main(int argc, const char* argv[])
{
const char* \U0001F603 = "I'm very happy";
return 0;
}
main.cpp:3:15: error: universal character \U0001F603 is not valid in an identifier
So apparently π isn't a valid identifier character. However, the standard specifically allows characters from the range 10000-1FFFD
in Annex E.1 and doesn't disallow it as an initial character in E.2.
My next effort was to see if any other allowed Unicode characters worked - but none that I tried did. Not even the ever important PILE OF POO (π©) character.
So, for the sake of meaningful and descriptive variable names, what gives? Does -fextended-identifiers
do as it advertises or not? Is it only supported in the very latest build? And what kind of support do other compilers have?
Solution 1:
As of 4.8, gcc does not support characters outside of the BMP used as identifiers. It seems to be an unnecessary restriction. Also, gcc only supports a very restricted set of character described in ucnid.tab, based on C99 and C++98 (it is not updated to C11 and C++11 yet, it seems).
As described in the manual, -fextended-identifiers
is experimental, so it has a higher chance won't work as expected.
Edit:
GCC supported the C11 character set starting from 4.9.0 (svn r204886 to be precise). So OP's second piece of code using \U0001F603
does work. I still can't get the actual code using π
to work even with -finput-charset=UTF-8
with GCC 8.2 on https://gcc.godbolt.org though (You may want to follow this bug report, provided by @DanielWolf).
Meanwhile both pieces of code work on clang 3.3 without any options other than -std=c++11
.
Solution 2:
This was a known bug in GCC 9 and before. This has been fixed in GCC 10.
The official changelog for GCC 10 contains this section:
Extended characters in identifiers may now be specified directly in the input encoding (UTF-8, by default), in addition to the UCN syntax (
\uNNNN
or\UNNNNNNNN
) that is already supported:
static const int Ο = 3;
int get_naΓ―ve_pi() {
return Ο;
}