Why is this C++ program so incredibly fast?

The optimizer has worked out that the inner loop along with the subsequent line is a no-op, and eliminated it. Unfortunately it hasn't managed to eliminate the outer loop as well.

Note that the node.js example is faster than the unoptimized C++ example, indicating that V8 (node's JIT compiler) has managed to eliminate at least one of the loops. However, its optimization has some overhead, as (like any JIT compiler) it must balance the opportunities for optimization and profile-guided re-optimization against the cost of doing so.


I didn't do a complete analysis of the assembly, but it looks like it did loop unrolling of the inner loop and figured out that together with the subtraction of inner it is a nop.

The assembly only seems to do the outer loop which only increments a counter until outer is reached. It could even have optimized that away, but it seems like it didn't do that.


Is there a way to cache the JIT compiled code after it optimizes it, or does it have to re-optimize the code every time the program is run?

If I were writing in Python I'd try to reduce the code down in size to get an "overhead" view of what the code was doing. Like try writing this (much easier to read IMO):

for i in range(outer):
    innerS = sum(1 for _ in xrange(inner))
    s += innerS
    s -= innerS

or even s = sum(inner - inner for _ in xrange(outer))


for (uint32_t i = 0; i < outer; ++i) {
    for (uint32_t j = 0; j < inner; ++j)
        ++s;
    s -= inner;
}

The inner loop is equivalent to "s += inner; j = inner; " which a good optimising compiler can do. Since the variable j is gone after the loop, the whole code is equivalent to

for (uint32_t i = 0; i < outer; ++i) {
    s += inner;
    s -= inner;
}

Again, a good optimising compiler can remove the two changes to s, then remove the variable i, and there's nothing left whatsoever. It seems that is what happened.

Now it's up to you to decide how often an optimisation like this happens, and whether it is any real life benefit.