C++ scope guard with zero overhead
In C++ we can ensure foo
is called when we exit a scope by putting foo()
in the destructor of a local object. That's what I think of when I head "scope guard." There are plenty of generic implementations.
I'm wondering—just for fun—if it's possible to achieve the behavior of a scope guard with zero overhead compared to just writing foo()
at every exit point.
Zero overhead, I think:
{
try {
do_something();
} catch (...) {
foo();
throw;
}
foo();
}
Overhead of at least 1 byte to give the scope guard an address:
{
scope_guard<foo> sg;
do_something();
}
Do compilers optimize away giving sg
an address?
A slightly more complicated case:
{
Bar bar;
try {
do_something();
} catch (...) {
foo(bar);
throw;
}
foo(bar);
}
versus
{
Bar bar;
scope_guard<[&]{foo(bar);}> sg;
do_something();
}
The lifetime of bar
entirely contains the lifetime of sg
and its held lambda (destructors are called in reverse order) but the lambda held by sg
still has to hold a reference to bar
. I mean for example int x; auto l = [&]{return x;};
gives sizeof(l) == 8
on my 64-bit system.
Is there maybe some template metaprogramming magic that achieve the scope_guard
sugar without any overhead?
Solution 1:
If by overhead you mean how much space is occupied by scope-guard variable then zero overhead is possible if functional object is compile-time value. I've coded small snippet to illustrate this:
Try it online!
#include <iostream>
template <auto F>
class ScopeGuard {
public:
~ScopeGuard() { F(); }
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
char a = 0;
ScopeGuard<&Cleanup> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
{
auto constexpr f = []{
std::cout << "Cleanup lambda..." << std::endl; };
char a = 0;
ScopeGuard<f> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
}
Output:
Stack difference 0
Cleanup func...
Stack difference 0
Cleanup lambda...
Code above doesn't create even a single byte on a stack, because any class variable that has no fields occupies on stack 0 bytes, this is one of obvious optimizations that is done by any compiler. Of course unless you take a pointer to such object then compiler is obliged to create 1-byte memory object. But in your case you don't take address to scoped guard.
You can see that there is not a single byte occupied by looking at Try it online!
link above the code, it shows assembler output of CLang.
To have no fields at all scoped guard class should only use compile-time function object, like global function pointer of lambda without capture. This two kinds of objects are used in my code above.
In code above you can even see that I outputted stack difference of char variable before and after scoped guard variable to show that scoped guard actually occupies 0 bytes.
Lets go a bit further and make possibility to have non-compile-time values of functional objects.
For this again we create class with no fields, but now store all functional objects inside one shared vector with thread local storage.
Again as we have no fields in class and don't take any pointer to scoped guard object then compiler doesn't create not a single byte for scoped guard object on stack.
But instead single shared vector is allocated in heap. This way you can trade stack storage for heap storage if you're out of stack memory.
Also having shared vector will allow us to use as few memory as possible, because vector uses only as much memory as many there are nested blocks that use scoped guard. If all scoped guards are located sequentially in different blocks then vector will have just 1 element inside so using just few bytes of memory for all scoped guards that were used.
Why heap memory of shared vector is more economical memory-wise than stack-stored memory of scoped guard. Because in case of stack memory if you have several sequential blocks of guards:
void test() {
{
ScopeGuard sg(f0);
}
{
ScopeGuard sg(f1);
}
{
ScopeGuard sg(f2);
}
}
then all 3 guards occupy tripple amount of memory on stack, because for each function like test()
above compiler allocates stack memory for all used in function's variables, so for 3 guards it allocates tripple amount.
In case of shared vector test()
function above will use just 1 vector's element, so vector will have size of 1 at most hence will use just single amount of memory to store functional object.
Hence if you have many non-nested scoped guards inside one function then shared vector will be much more economical.
Now below I present code snippet for shared-vector approach with zero fields and zero stack memory overhead. To remind, this approach allows to use non-compile-time functional objects unlike solution in part one of my answer.
Try it online!
#include <iostream>
#include <vector>
#include <functional>
class ScopeGuard2 {
public:
static auto & Funcs() {
thread_local std::vector<std::function<void()>> funcs_;
return funcs_;
}
ScopeGuard2(std::function<void()> f) {
Funcs().emplace_back(std::move(f));
}
~ScopeGuard2() {
Funcs().at(Funcs().size() - 1)();
Funcs().pop_back();
}
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
ScopeGuard2 sg(&Cleanup);
}
{
auto volatile x = 123;
auto const f = [&]{
std::cout << "Cleanup lambda... x = "
<< x << std::endl;
};
ScopeGuard2 sg(f);
}
}
Output:
Cleanup func...
Cleanup lambda... x = 123
Solution 2:
It's not exactly clear what you mean by 'zero overhead' here.
Do compilers optimize away giving sg an address?
Most likely modern mainstream compilers will do it when run in optimizing modes. Unfortunately, that's as much definite as it can get. It depends on the environment and has to be tested to be relied upon.
If the question is if there is a guaranteed way to avoid <anything> in the resulting assembly
, the answer is negative. As @Peter said in the comment, compiler is allowed to do anything to produce the equivalent result. It may not ever call foo()
at all, even if you write it there verbatim - when it can prove that nothing in the observed program behavior will change.