Is this a known pitfall of C++11 for loops?
Is this correct and Widely appreciated?
Yes, your understanding of things is correct.
Which part of the above is the "bad" part, that should be avoided?
The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:
auto &&t = Vector{1., 2., 3.}.normalize();
The temporary Vector{1., 2., 3.}
's lifetime cannot be extended because the compiler has no idea that the return value from normalize
references it.
Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?
That would be highly inconsistent with how C++ works.
Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.
A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this
, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.
Presently (if the compiler supports it), you can make it so that normalize
cannot be called on a temporary:
struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector &normalize() && = delete;
};
This will cause Vector{1., 2., 3.}.normalize()
to give a compile error, while v.normalize()
will work fine. Obviously you won't be able to do correct things like this:
Vector t = Vector{1., 2., 3.}.normalize();
But you also won't be able to do incorrect things.
Alternatively, as suggested in the comments, you can make the rvalue reference version return a value rather than a reference:
struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector normalize() && {
Vector ret = *this;
ret.normalize();
return ret;
}
};
If Vector
was a type with actual resources to move, you could use Vector ret = std::move(*this);
instead. Named return value optimization makes this reasonably optimal in terms of performance.
for (double x : Vector{1., 2., 3.}.normalize()) { ... }
That is not a limitation of the language, but a problem with your code. The expression Vector{1., 2., 3.}
creates a temporary, but the normalize
function returns an lvalue-reference. Because the expression is an lvalue, the compiler assumes that the object will be alive, but because it is a reference to a temporary, the object dies after the full expression is evaluated, so you are left with a dangling reference.
Now, if you change your design to return a new object by value rather than a reference to the current object, then there would be no issue and the code would work as expected.
IMHO, the second example is already flawed. That the modifying operators return *this
is convenient in the way you mentioned: it allows chaining of modifiers. It can be used for simply handing on the result of the modification, but doing this is error-prone because it can easily be overlooked. If I see something like
Vector v{1., 2., 3.};
auto foo = somefunction1(v, 17);
auto bar = somefunction2(true, v, 2, foo);
auto baz = somefunction3(bar.quun(v), 93.2, v.qwarv(foo));
I wouldn't automatically suspect that the functions modify v
as a side-effect. Of course, they could, but it would be confusing. So if I was to write something like this, I would make sure that v
stays constant. For your example, I would add free functions
auto normalized(Vector v) -> Vector {return v.normalize();}
auto negated(Vector v) -> Vector {return v.negate();}
and then write the loops
for( double x : negated(normalized(v)) ) { ... }
and
for( double x : normalized(Vector{1., 2., 3}) ) { ... }
That is IMO better readable, and it's safer. Of course, it requires an extra copy, however for heap-allocated data this could likely be done in a cheap C++11 move operation.