How did I get a value larger than 8 bits in size from an 8-bit integer?
I tracked down an extremely nasty bug hiding behind this little gem. I am aware that per the C++ spec, signed overflows are undefined behavior, but only when the overflow occurs when the value is extended to bit-width sizeof(int)
. As I understand it, incrementing a char
shouldn't ever be undefined behavior as long as sizeof(char) < sizeof(int)
. But that doesn't explain how c
is getting an impossible value. As an 8-bit integer, how can c
hold values greater than its bit-width?
Code
// Compiled with gcc-4.7.2
#include <cstdio>
#include <stdint.h>
#include <climits>
int main()
{
int8_t c = 0;
printf("SCHAR_MIN: %i\n", SCHAR_MIN);
printf("SCHAR_MAX: %i\n", SCHAR_MAX);
for (int32_t i = 0; i <= 300; i++)
printf("c: %i\n", c--);
printf("c: %i\n", c);
return 0;
}
Output
SCHAR_MIN: -128
SCHAR_MAX: 127
c: 0
c: -1
c: -2
c: -3
...
c: -127
c: -128 // <= The next value should still be an 8-bit value.
c: -129 // <= What? That's more than 8 bits!
c: -130 // <= Uh...
c: -131
...
c: -297
c: -298 // <= Getting ridiculous now.
c: -299
c: -300
c: -45 // <= ..........
Check it out on ideone.
This is a compiler bug.
Although getting impossible results for undefined behaviour is a valid consequence, there is actually no undefined behaviour in your code. What's happening is that the compiler thinks the behaviour is undefined, and optimises accordingly.
If c
is defined as int8_t
, and int8_t
promotes to int
, then c--
is supposed to perform the subtraction c - 1
in int
arithmetic and convert the result back to int8_t
. The subtraction in int
does not overflow, and converting out-of-range integral values to another integral type is valid. If the destination type is signed, the result is implementation-defined, but it must be a valid value for the destination type. (And if the destination type is unsigned, the result is well-defined, but that does not apply here.)
A compiler can have bugs which are other than nonconformances to the standard, because there are other requirements. A compiler should be compatible with other versions of itself. It may also be expected to be compatible in some ways with other compilers, and also to conform to some beliefs about behavior that are held by the majority of its user base.
In this case, it appears to be a conformance bug. The expression c--
should manipulate c
in a way similar to c = c - 1
. Here, the value of c
on the right is promoted to type int
, and then the subtraction takes place. Since c
is in the range of int8_t
, this subtraction will not overflow, but it may produce a value which is out of the range of int8_t
. When this value is assigned, a conversion takes place back to the type int8_t
so the result fits back into c
. In the out-of-range case, the conversion has an implementation-defined value. But a value out of the range of int8_t
is not a valid implementation-defined value. An implementation cannot "define" that an 8 bit type suddenly holds 9 or more bits. For the value to be implementation-defined means that something in the range of int8_t
is produced, and the program continues. The C standard thereby allows for behaviors such as saturation arithmetic (common on DSP's) or wrap-around (mainstream architectures).
The compiler is using a wider underlying machine type when manipulating values of small integer types like int8_t
or char
. When arithmetic is performed, results which are out of range of the small integer type can be captured reliably in this wider type. To preserve the externally visible behavior that the variable is an 8 bit type, the wider result has to be truncated into the 8 bit range. Explicit code is required to do that since the machine storage locations (registers) are wider than 8 bits and happy with the larger values. Here, the compiler neglected to normalize the value and simply passed it to printf
as is. The conversion specifier %i
in printf
has no idea that the argument originally came from int8_t
calculations; it is just working with an int
argument.
I can't fit this in a comment, so I'm posting it as an answer.
For some very odd reason, the --
operator happens to be the culprit.
I tested the code posted on Ideone and replaced c--
with c = c - 1
and the values remained within the range [-128 ... 127]:
c: -123
c: -124
c: -125
c: -126
c: -127
c: -128 // about to overflow
c: 127 // woop
c: 126
c: 125
c: 124
c: 123
c: 122
Freaky ey? I don't know much about what the compiler does to expressions like i++
or i--
. It's likely promoting the return value to an int
and passing it. That's the only logical conclusion I can come up with because you ARE in fact getting values that cannot fit into 8-bits.
I guess that the underlying hardware is still using a 32-bit register to hold that int8_t. Since the specification does not impose a behaviour for overflow, the implementation does not check for overflow and allows larger values to be stored as well.
If you mark the local variable as volatile
you are forcing to use memory for it and consequently obtain the expected values within the range.