Why do C++ streams use char instead of unsigned char?

Possibly I've misunderstood the question, but conversion from unsigned char to char isn't unspecified, it's implementation-dependent (4.7-3 in the C++ standard).

The type of a 1-byte character in C++ is "char", not "unsigned char". This gives implementations a bit more freedom to do the best thing on the platform (for example, the standards body may have believed that there exist CPUs where signed byte arithmetic is faster than unsigned byte arithmetic, although that's speculation on my part). Also for compatibility with C. The result of removing this kind of existential uncertainty from C++ is C# ;-)

Given that the "char" type exists, I think it makes sense for the usual streams to use it even though its signedness isn't defined. So maybe your question is answered by the answer to, "why didn't C++ just define char to be unsigned?"


I have always understood it this way: the purpose of the iostream class is to read and/or write a stream of characters, which, if you think about it, are abstract entities that are only represented by the computer using a character encoding. The C++ standard makes great pains to avoid pinning down the character encoding, saying only that "Objects declared as characters (char) shall be large enough to store any member of the implementation's basic character set," because it doesn't need to force the "implementation basic character set" to define the C++ language; the standard can leave the decision of which character encoding is used to the implementation (compiler together with an STL implementation), and just note that char objects represent single characters in some encoding.

An implementation writer could choose a single-octet encoding such as ISO-8859-1 or even a double-octet encoding such as UCS-2. It doesn't matter. As long as a char object is "large enough to store any member of the implementation's basic character set" (note that this explicitly forbids variable-length encodings), then the implementation may even choose an encoding that represents basic Latin in a way that is incompatible with any common encoding!

It is confusing that the char, signed char, and unsigned char types share "char" in their names, but it is important to keep in mind that char does not belong to the same family of fundamental types as signed char and unsigned char. signed char is in the family of signed integer types:

There are four signed integer types: "signed char", "short int", "int", and "long int."

and unsigned char is in the family of unsigned integer types:

For each of the signed integer types, there exists a corresponding (but different) unsigned integer type: "unsigned char", "unsigned short int", "unsigned int", and "unsigned long int," ...

The one similarity between the char, signed char, and unsigned char types is that "[they] occupy the same amount of storage and have the same alignment requirements". Thus, you can reinterpret_cast from char * to unsigned char * in order to determine the numeric value of a character in the execution character set.

To answer your question, the reason why the STL uses char as the default type is because the standard streams are meant for reading and/or writing streams of characters, represented by char objects, not integers (signed char and unsigned char). The use of char versus the numeric value is a way of separating concerns.


char is for characters, unsigned char for raw bytes of data, and signed chars for, well, signed data.

Standard does not specify if signed or unsigned char will be used for the implementation of char - it is compiler-specific. It only specifies that the "char" will be "enough" to hold characters on you system - the way characters were in those days, which is, no UNICODE.

Using "char" for characters is the standard way to go. Using unsigned char is a hack, although it'll match compiler's implementation of char on most platforms.