C++ crashes in a 'for' loop with a negative expression

c++

s.length() is unsigned integer type. When you subtract 3, you make it negative. For an unsigned, it means very big.

A workaround (valid as long the string is long up to INT_MAX) would be to do like this:

#include <string>

using namespace std;

int main() {

    string s = "aa";

    for (int i = 0; i < static_cast<int> (s.length() ) - 3; i++) {

    }
}

Which would never enter the loop.

A very important detail is that you have probably received a warning "comparing signed and unsigned value". The problem is that if you ignore those warnings, you enter the very dangerous field of implicit "integer conversion"^(*), which has a defined behaviour, but it is difficult to follow: the best is to never ignore those compiler warnings.

_{(*) You might also be interested to know about "integer promotion".}

First of all: why does it crash? Let's step through your program like a debugger would.

Note: I'll assume that your loop body isn't empty, but accesses the string. If this isn't the case, the cause of the crash is undefined behaviour through integer overflow. See Richard Hansens answer for that.

std::string s = "aa";//assign the two-character string "aa" to variable s of type std::string
for ( int i = 0; // create a variable i of type int with initial value 0 
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 1!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 2!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 3!
.
.

We would expect the check i < s.length() - 3 to fail right away, since the length of s is two (we only every given it a length at the beginning and never changed it) and 2 - 3 is -1, 0 < -1 is false. However we do get an "OK" here.

This is because s.length() isn't 2. It's 2u. std::string::length() has return type size_t which is an unsigned integer. So going back to the loop condition, we first get the value of s.length(), so 2u, now subtract 3. 3 is an integer literal and interpreted by the compiler as type int. So the compiler has to calculate 2u - 3, two values of different types. Operations on primitive types only work for same types, so one has to be converted into the other. There are some strict rules, in this case, unsigned "wins", so 3 get's converted to 3u. In unsigned integers, 2u - 3u can't be -1u as such a number does not exists (well, because it has a sign of course!). Instead it calculates every operation modulo 2^(n_bits), where n_bits is the number of bits in this type (usually 8, 16, 32 or 64). So instead of -1 we get 4294967295u (assuming 32bit).

So now the compiler is done with s.length() - 3 (of course it's much much faster than me ;-) ), now let's go for the comparison: i < s.length() - 3. Putting in the values: 0 < 4294967295u. Again, different types, 0 becomes 0u, the comparison 0u < 4294967295u is obviously true, the loop condition is positively checked, we can now execute the loop body.

After incrementing, the only thing that changes in the above is the value of i. The value of i will again be converted into an unsigned int, as the comparison needs it.

So we have

(0u < 4294967295u) == true, let's do the loop body!
(1u < 4294967295u) == true, let's do the loop body!
(2u < 4294967295u) == true, let's do the loop body!

Here's the problem: What do you do in the loop body? Presumably you access the i^th character of your string, don't you? Even though it wasn't your intention, you didn't only accessed the zeroth and first, but also the second! The second doesn't exists (as your string only has two characters, the zeroth and first), you access memory you shouldn't, the program does whatever it wants (undefined behaviour). Note that the program isn't required to crash immediately. It can seem to work fine for another half an hour, so these mistakes are hard to catch. But it's always dangerous to access memory beyond the bounds, this is where most crashes come from.

So in summary, you get a different value from s.length() - 3 from that what you'd expect, this results in a positive loop condition check, that leads to repetitive execution of the loop body, which in itself accesses memory it shouldn't.

Now let's see how to avoid that, i.e. how to tell the compiler what you actually meant in your loop condition.

Lengths of strings and sizes of containers are inherently unsigned so you should use an unsigned integer in for loops.

Since unsigned int is fairly long and therefore undesirable to write over and over again in loops, just use size_t. This is the type every container in the STL uses for storing length or size. You may need to include cstddef to assert platform independence.

#include <cstddef>
#include <string>

using namespace std;

int main() {

    string s = "aa";

    for ( size_t i = 0; i + 3 < s.length(); i++) {
    //    ^^^^^^         ^^^^
    }
}

Since a < b - 3 is mathematically equivalent to a + 3 < b, we can interchange them. However, a + 3 < b prevents b - 3 to be a huge value. Recall that s.length() returns an unsigned integer and unsigned integers perform operations module 2^(bits) where bits is the number of bits in the type (usually 8, 16, 32 or 64). Therefore with s.length() == 2, s.length() - 3 == -1 == 2^(bits) - 1.

Alternatively, if you want to use i < s.length() - 3 for personal preference, you have to add a condition:

for ( size_t i = 0; (s.length() > 3) && (i < s.length() - 3); ++i )
//    ^             ^                    ^- your actual condition
//    ^             ^- check if the string is long enough
//    ^- still prefer unsigned types!

Actually, in the first version you loop for a very long time, as you compare i to an unsigned integer containing a very large number. The size of a string is (in effect) the same as size_t which is an unsigned integer. When you subtract the 3 from that value it underflows and goes on to be a big value.

In the second version of the code, you assign this unsigned value to a signed variable, and so you get the correct value.

And it's not actually the condition or the value that causes the crash, it's most likely that you index the string out of bounds, a case of undefined behavior.

C++ crashes in a 'for' loop with a negative expression

Related

Recent Posts