When did C++ compilers start considering more than two hex digits in string literal character escapes?
GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."
I have found answers to my questions:
-
C++ has always been this way (checked Stroustrup 3rd edition, didn't have any earlier). K&R 1st edition did not mention
\x
at all (the only character escapes available at that time were octal). K&R 2nd edition states:'\xhh'
where hh is one or more hexadecimal digits (0...9, a...f, A...F).
so it appears this behaviour has been around since ANSI C.
While it might be possible for the compiler to only accept >2 characters for wide string literals, this would unnecessarily complicate the grammar.
-
There is indeed a less awkward workaround:
char foo[] = "\u00ABEcho";
The
\u
escape accepts four hex digits always.
Update: The use of \u
isn't quite applicable in all situations because most ASCII characters are (for some reason) not permitted to be specified using \u
. Here's a snippet from GCC:
/* The standard permits $, @ and ` to be specified as UCNs. We use
hex escapes so that this also works with EBCDIC hosts. */
else if ((result < 0xa0
&& (result != 0x24 && result != 0x40 && result != 0x60))
|| (result & 0x80000000)
|| (result >= 0xD800 && result <= 0xDFFF))
{
cpp_error (pfile, CPP_DL_ERROR,
"%.*s is not a valid universal character",
(int) (str - base), base);
result = 1;
}
I'm pretty sure that C++ has always been this way. In any case, CHAR_BIT
may be greater than 8, in which case '\xABE'
or '\xABEc'
could be valid.