Capturing a reference by reference in a C++11 lambda

The code is guaranteed to work.

Before we delve into the standards wording: it's the C++ committee's intent that this code works. However, the wording as it stands was believed to be insufficiently clear on this (and indeed, bugfixes made to the standard post-C++14 broke the delicate arrangement that made it work), so CWG issue 2011 was raised to clarify matters, and is making its way through the committee now. As far as I know, no implementation gets this wrong.


I'd like to clarify a couple of things, because Ben Voigt's answer contains some factual errors that are creating some confusion:

  1. "Scope" is a static, lexical notion in C++, that describes a region of the program source code in which unqualified name lookup associates a particular name with a declaration. It has nothing to do with lifetime. See [basic.scope.declarative]/1.
  2. The "reaching scope" rules for lambdas are, likewise, a syntactic property that determine when capture is permitted. For example:

    void f(int n) {
      struct A {
        void g() { // reaching scope of lambda starts here
          [&] { int k = n; };
          // ...
    

    n is in scope here, but the reaching scope of the lambda does not include it, so it cannot be captured. Put another way, the reaching scope of the lambda is how far "up" it can reach and capture variables -- it can reach up to the enclosing (non-lambda) function and its parameters, but it can't reach outside that and capture declarations that appear outside.

So the notion of "reaching scope" is irrelevant to this question. The entity being captured is make_function's parameter x, which is within the reaching scope of the lambda.


OK, so let's look at the standard's wording on this issue. Per [expr.prim.lambda]/17, only id-expressions referring to entities captured by copy are transformed into a member access on the lambda closure type; id-expressions referring to entities captured by reference are left alone, and still denote the same entity they would have denoted in the enclosing scope.

This immediately seems bad: the reference x's lifetime has ended, so how can we refer to it? Well, it turns out that there is almost (see below) no way to refer to a reference outside its lifetime (you can either see a declaration of it, in which case it's in scope and thus presumably OK to use, or it's a class member, in which case the class itself must be within its lifetime for the member access expression to be valid). As a result, the standard did not have any prohibitions on using a reference outside its lifetime until very recently.

The lambda wording took advantage of the fact that there is no penalty for using a reference outside its lifetime, and so didn't need to give any explicit rules for what access to an entity captured by reference means -- it just means you use that entity; if it's a reference, the name denotes its initializer. And that's how this was guaranteed to work up until very recently (including in C++11 and C++14).

However, it's not quite true that you can't mention a reference outside its lifetime; in particular, you can reference it from within its own initializer, from the initializer of a class member earlier than the reference, or if it is a namespace-scope variable and you access it from another global that is initialized before it is. CWG issue 2012 was introduced to fix that oversight, but it inadvertantly broke the specification for lambda capture by reference of references. We should get this regression fixed before C++17 ships; I've filed a National Body comment to make sure it's suitably prioritized.


TL;DR: The code in the question is not guaranteed by the Standard, and there are reasonable implementations of lambdas which cause it to break. Assume it is non-portable and instead use

std::function<void()> make_function(int& x)
{
    const auto px = &x;
    return [/* = */ px]{ std::cout << *px << std::endl; };
}

Beginning in C++14, you can do away with explicit use of a pointer using an initialized capture, which forces a new reference variable to be created for the lambda, instead of reusing the one in the enclosing scope:

std::function<void()> make_function(int& x)
{
    return [&x = x]{ std::cout << x << std::endl; };
}

On first glance, it seems that should be safe, but the wording of the Standard causes a bit of a problem:

A lambda-expression whose smallest enclosing scope is a block scope (3.3.3) is a local lambda expression; any other lambda-expression shall not have a capture-default or simple-capture in its lambda-introducer. The reaching scope of a local lambda expression is the set of enclosing scopes up to and including the innermost enclosing function and its parameters.

...

All such implicitly captured entities shall be declared within the reaching scope of the lambda expression.

...

[ Note: If an entity is implicitly or explicitly captured by reference, invoking the function call operator of the corresponding lambda-expression after the lifetime of the entity has ended is likely to result in undefined behavior. — end note ]

What we expect to happen is that x, as used inside make_function, refers to i in main() (since that is what references do), and the entity i is captured by reference. Since that entity still lives at the time of the lambda call, everything is good.

But! "implicitly captured entities" must be "within the reaching scope of the lambda expression", and i in main() is not in the reaching scope. :( Unless the parameter x counts as "declared within the reaching scope" even though the entity i itself is outside the reaching scope.

What this sounds like is that, unlike any other place in C++, a reference-to-reference is created, and the lifetime of a reference has meaning.

Definitely something I would like to see the Standard clarify.

In the meantime, the variant shown in the TL;DR section is definitely safe because the pointer is captured by value (stored inside the lambda object itself), and it is a valid pointer to an object which lasts through the call of the lambda. I would also expect that capturing by reference actually ends up storing a pointer anyway, so there should be no runtime penalty for doing this.


On closer inspection, we also imagine that it could break. Remember that on x86, in the final machine code, both local variables and function parameters are accessed using EBP-relative addressing. Parameters have a positive offset, while locals are negative. (Other architectures have different register names but many work in the same way.) Anyway, this means that capture-by-reference can be implemented by capturing only the value of EBP. Then locals and parameters alike can again be found via relative addressing. And in fact I believe I've heard of lambda implementations (in languages which had lambdas long before C++) doing exactly this: capturing the "stack frame" where the lambda was defined.

What this implies is that when make_function returns and its stack frame goes away, so does all ability to access locals AND parameters, even those which are references.

And the Standard contains the following rule, likely specifically to enable this approach:

It is unspecified whether additional unnamed non-static data members are declared in the closure type for entities captured by reference.

Conclusion: The code in the question is not guaranteed by the Standard, and there are reasonable implementations of lambdas which cause it to break. Assume it is non-portable.