Signed overflow in C++ and undefined behaviour (UB)

Solution 1:

Compilers do assume that a valid C++ program does not contain UB. Consider for example:

if (x == nullptr) {
    *x = 3;
} else {
    *x = 5;
}

If x == nullptr then dereferencing it and assigning a value is UB. Hence the only way this could end in a valid program is when x == nullptr will never yield true and the compiler can assume under the as if rule, the above is equivalent to:

*x = 5;

Now in your code

int result = 0;
int factor = 1;
for (...) {      // Loop until factor overflows but not more
   result = ...
   factor *= 10;
}
return result;

The last multiplication of factor cannot happen in a valid program (signed overflow is undefined). Hence also the assignment to result cannot happen. As there is no way to branch before the last iteration also the previous iteration cannot happen. Eventually, the part of code that is correct (i.e., no undefined behaviour ever happens) is:

// nothing :(

Solution 2:

The behaviour of int overflow is undefined.

It doesn't matter if you read factor outside the loop body; if it has overflowed by then then the behaviour of your code on, after, and somewhat paradoxically before the overflow is undefined.

One issue that might arise in keeping this code is that compilers are getting more and more aggressive when it comes to optimisation. In particular they are developing a habit where they assume that undefined behaviour never happens. For this to be the case, they may remove the for loop altogether.

Can't you use an unsigned type for factor although then you'd need to worry about unwanted conversion of int to unsigned in expressions containing both?

Solution 3:

It might be insightful to consider real-world optimizers. Loop unrolling is a known technique. The basic idea of loop unrolling is that

for (int i = 0; i != 3; ++i)
    foo()

might be better implemented behind the scenes as

 foo()
 foo()
 foo()

This is the easy case, with a fixed bound. But modern compilers can also do this for variable bounds:

for (int i = 0; i != N; ++i)
   foo();

becomes

__RELATIVE_JUMP(3-N)
foo();
foo();
foo();

Obviously this only works if the compiler knows that N<=3. And that's where we get back to the original question:

int result = 0;
int factor = 1;
for (...) {
    result = ...
    factor *= 10;
}
return result;

Because the compiler knows that signed overflow does not occur, it knows that the loop can execute a maximum of 9 times on 32 bits architectures. 10^10 > 2^32. It can therefore do a 9 iteration loop unroll. But the intended maximum was 10 iterations !.

What might happen is that you get a relative jump to a assembly instruction (9-N) with N==10, so an offset of -1, which is the jump instruction itself. Oops. This is a perfectly valid loop optimization for well-defined C++, but the example given turns into a tight infinite loop.