When did C++ compilers start considering more than two hex digits in string literal character escapes?

GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."


I have found answers to my questions:

  • C++ has always been this way (checked Stroustrup 3rd edition, didn't have any earlier). K&R 1st edition did not mention \x at all (the only character escapes available at that time were octal). K&R 2nd edition states:

    '\xhh'
    

    where hh is one or more hexadecimal digits (0...9, a...f, A...F).

    so it appears this behaviour has been around since ANSI C.

  • While it might be possible for the compiler to only accept >2 characters for wide string literals, this would unnecessarily complicate the grammar.

  • There is indeed a less awkward workaround:

    char foo[] = "\u00ABEcho";
    

    The \u escape accepts four hex digits always.

Update: The use of \u isn't quite applicable in all situations because most ASCII characters are (for some reason) not permitted to be specified using \u. Here's a snippet from GCC:

/* The standard permits $, @ and ` to be specified as UCNs.  We use
     hex escapes so that this also works with EBCDIC hosts.  */
  else if ((result < 0xa0
            && (result != 0x24 && result != 0x40 && result != 0x60))
           || (result & 0x80000000)
           || (result >= 0xD800 && result <= 0xDFFF))
    {
      cpp_error (pfile, CPP_DL_ERROR,
                 "%.*s is not a valid universal character",
                 (int) (str - base), base);
      result = 1;
    }

I'm pretty sure that C++ has always been this way. In any case, CHAR_BIT may be greater than 8, in which case '\xABE' or '\xABEc' could be valid.