Multiple characters in a character constant
Some C compilers permit multiple characters in a character constant. This means that writing 'yes' instead of "yes" may well go undetected. Source: C traps and pitfalls
Can anyone give an example of this where multiple characters are allowed in a character constant?
As Code Monkey cited, it is implementation defined and implementation varies -- it isn't just a BigEndian/LittleEndian and charset difference. I've tested four implementations (all using ASCII) with the program
#include <stdio.h>
int main()
{
unsigned value = 'ABCD';
char* ptr = (char*)&value;
printf("'ABCD' = %02x%02x%02x%02x = %08x\n", ptr[0], ptr[1], ptr[2], ptr[3], value);
value = 'ABC';
printf("'ABC' = %02x%02x%02x%02x = %08x\n", ptr[0], ptr[1], ptr[2], ptr[3], value);
return 0;
}
and I got four different results
Big endian (AIX, POWER, IBM compiler)
'ABCD' = 41424344 = 41424344
'ABC' = 00414243 = 00414243
Big endian (Solaris, Sparc, SUN compiler)
'ABCD' = 44434241 = 44434241
'ABC' = 00434241 = 00434241
Little endian (Linux, x86_64, gcc)
'ABCD' = 44434241 = 41424344
'ABC' = 43424100 = 00414243
Little endian (Solaris, x86_64, Sun compiler)
'ABCD' = 41424344 = 44434241
'ABC' = 41424300 = 00434241
You could use it in a case statement, I guess, but I wouldn't recommend it.
'yes'
is a multicharacter constant. Its type is int
, and its value is implementation dependent. So like you already stated, it's up to the compiler.
so int foo = 'yes';
ARM, section 2.5.2, page 9:
"A character constant is one or more characters enclosed in single quotes, as in 'x'."
Later on the same page:
"Multicharacter constants have type int. The value of a multicharacter constant is implementation dependent. For example, the value of 'AB' could reasonably be expected to be 'A' 'B' and ('A'<<8)+'B' on three different implementations. Multicharacter constants are usually best avoided."
and
Quoting from the ANSI C specification (to which C++ makes some attempt to be compatible):
3.1.3.4 Character Constants Semantics
An integer charcter constant has type int [note that it has type char in C++]...The value of an integer character constant containing more than one character...is implementation-defined.
Multi-character constants are allowed in all contexts where single-character constants are allowed.
As for where they'd actually be used, I've seen code that uses multi-character constants to create legible unique values. For example, assuming that int is 4 bytes, 'ABCD' and 'EFGH' are likely to be distinct. (This isn't guaranteed by the language; the implementation must document the mapping, but it needn't be reasonable.) And assuming a reasonable mapping, you'll likely see "ABCD" or "EFGH" in the object code. Not the best idea in the world, but it can work if you don't care much about portability.
Incidentally, all conforming C compilers support multi-character constants (by definition; a compiler that doesn't support them is non-conforming).