noexcept, stack unwinding and performance
The following draft from Scott Meyers new C++11 book says(page 2, lines 7-21)
The difference between unwinding the call stack and possibly unwinding it has a surprisingly large impact on code generation. In a noexcept function, optimizers need not keep the runtime stack in an unwindable state if an exception would propagate out of the function, nor must they ensure that objects in a noexcept function are destroyed in the inverse order of construction should an exception leave the function. The result is more opportunities for optimization, not only within the body of a noexcept function, but also at sites where the function is called. Such flexibility is present only for noexcept functions. Functions with “throw()” exception specifications lack it, as do functions with no exception specification at all.
In contrast, section 5.4
of "Technical Report on C++ Performance" describes the "code" and "table" ways of implementing exception handling. In particular, the "table" method is shown to have no time overhead when no exceptions are thrown and only has a space overhead.
My question is this - what optimizations is Scott Meyers talking about when he talks of unwinding vs possibly unwinding? Why don't these optimizations apply for throw()
? Do his comments apply only to the "code" method mentioned in the 2006 TR?
Solution 1:
There's "no" overhead and then there's no overhead. You can think of the compiler in different ways:
- It generates a program which performs certain actions.
- It generates a program satisfying certain constraints.
The TR says there's no overhead in the table-driven appraoch because no action needs to be taken as long as a throw doesn't occur. The non-exceptional execution path goes straight forward.
However, to make the tables work, the non-exceptional code still needs additional constraints. Each object needs to be fully initialized before any exception could lead to its destruction, limiting the reordering of instructions (e.g. from an inlined constructor) across potentially throwing calls. Likewise, an object must be completely destroyed before any possible subsequent exception.
Table-based unwinding only works with functions following the ABI calling conventions, with stack frames. Without the possibility of an exception, the compiler may have been free to ignore the ABI and omit the frame.
Space overhead, a.k.a. bloat, in the form of tables and separate exceptional code paths, might not affect execution time, but it can still affect time taken to download the program and load it into RAM.
It's all relative, but noexcept
cuts the compiler some slack.
Solution 2:
The difference between noexcept
and throw()
is that in case of throw()
the exception stack is still unwound and destructors are called, so implementation has to keep track of the stack (see 15.5.2 The std::unexpected() function
in the standard).
On the contrary, std::terminate()
does not require the stack to be unwound (15.5.1
states that it is implementation-defined whether or not the stack is unwound before std::terminate()
is called).
GCC seem to really not unwind the stack for noexcept
: Demo
While clang still unwinds: Demo
(You can comment f_noexcept()
and uncomment f_emptythrow()
in the demos to see that for throw()
both GCC and clang unwind the stack)
Solution 3:
Take the following example:
#include <stdio.h>
int fun(int a) {
int res;
try
{
res = a *11;
if(res == 33)
throw 20;
}
catch (int e)
{
char *msg = "error";
printf(msg);
}
return res;
}
int main(int argc, char** argv) {
return fun(argc);
}
the data passed as input isn't foresee-able from a compiler's perspective and thus no assumption can be made even with -O3
optimizations to completely elide the call or the exception system.
In LLVM IR the fun
function roughly translates as
define i32 @_Z3funi(i32 %a) #0 {
entry:
%mul = mul nsw i32 %a, 11 // The actual processing
%cmp = icmp eq i32 %mul, 33
br i1 %cmp, label %if.then, label %try.cont // jump if res == 33 to if.then
if.then: // lots of stuff happen here..
%exception = tail call i8* @__cxa_allocate_exception(i64 4) #3
%0 = bitcast i8* %exception to i32*
store i32 20, i32* %0, align 4, !tbaa !1
invoke void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8*), i8* null) #4
to label %unreachable unwind label %lpad
lpad:
%1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*)
catch i8* bitcast (i8** @_ZTIi to i8*)
... // also here..
invoke.cont:
... // and here
br label %try.cont
try.cont: // This is where the normal flow should go
ret i32 %mul
eh.resume:
resume { i8*, i32 } %1
unreachable:
unreachable
}
as you can see the codepath, even if straightforward in the event of a normal control flow (no exceptions), now consists of several basic blocks branches in the same function.
It is true that at runtime almost no cost is associated since you pay for what you use (if you don't throw, nothing extra happens), but having multiple branches might hurt your performances as well, e.g.
- branch prediction becomes harder
- register pressure might increase substantially
- [others]
and surely you can't run passthrough-branch optimizations between normal control flow and landing pads/exception entry points.
Exceptions are a complex mechanism and noexcept
greatly facilitates a compiler's life even in the even of zero-cost EH.
Edit: in the specific case of the noexcept
specifier, if the compiler can't 'prove' that your code doesn't throw, a std::terminate
EH is set up (with implementation-dependent details). In both cases (code doesn't throw and/or can't prove that the code doesn't throw) the mechanics involved are simpler and the compiler is less constrained. Anyway you don't really use noexcept
for optimization reasons, it's also an important semantic indication.
Solution 4:
I just made a benchmark, to measure the performance effect of adding a 'noexcept' specifier, for various test cases: https://github.com/N-Dekker/noexcept_benchmark It has a specific test case that could take advantage of the possibility to skip stack unwinding, with 'noexcept':
void recursive_func(recursion_data& data) noexcept // or no 'noexcept'!
{
if (--data.number_of_func_calls_to_do > 0)
{
noexcept_benchmark::throw_exception_if(data.volatile_false);
object_class stack_object(data.object_counter);
recursive_func(data);
}
}
https://github.com/N-Dekker/noexcept_benchmark/blob/v03/lib/stack_unwinding_test.cpp#L48
Looking at the benchmark results, it appears that both VS2017 x64 and GCC 5.4.0 yield a significant performance gain from adding 'noexcept', in this specific test case.