C/C++ Why to use unsigned char for binary data?

Solution 1:

In C the unsigned char data type is the only data type that has all the following three properties simultaneously

  • it has no padding bits, that it where all storage bits contribute to the value of the data
  • no bitwise operation starting from a value of that type, when converted back into that type, can produce overflow, trap representations or undefined behavior
  • it may alias other data types without violating the "aliasing rules", that is that access to the same data through a pointer that is typed differently will be guaranteed to see all modifications

if these are the properties of a "binary" data type you are looking for, you definitively should use unsigned char.

For the second property we need a type that is unsigned. For these all conversion are defined with modulo arihmetic, here modulo UCHAR_MAX+1, 256 in most 99% of the architectures. All conversion of wider values to unsigned char thereby just corresponds to truncation to the least significant byte.

The two other character types generally don't work the same. signed char is signed, anyhow, so conversion of values that don't fit it is not well defined. char is not fixed to be signed or unsigned, but on a particular platform to which your code is ported it might be signed even it is unsigned on yours.

Solution 2:

You'll get most of your problems when comparing the contents of individual bytes:

char c[5];
c[0] = 0xff;
/*blah blah*/
if (c[0] == 0xff)
{
    printf("good\n");
}
else
{
    printf("bad\n");
}

can print "bad", because, depending on your compiler, c[0] will be sign extended to -1, which is not any way the same as 0xff