Can the use of C++11's 'auto' improve performance?
I can see why the auto
type in C++11 improves correctness and maintainability. I've read that it can also improve performance (Almost Always Auto by Herb Sutter), but I miss a good explanation.
- How can
auto
improve performance? - Can anyone give an example?
auto
can aid performance by avoiding silent implicit conversions. An example I find compelling is the following.
std::map<Key, Val> m;
// ...
for (std::pair<Key, Val> const& item : m) {
// do stuff
}
See the bug? Here we are, thinking we're elegantly taking every item in the map by const reference and using the new range-for expression to make our intent clear, but actually we're copying every element. This is because std::map<Key, Val>::value_type
is std::pair<const Key, Val>
, not std::pair<Key, Val>
. Thus, when we (implicitly) have:
std::pair<Key, Val> const& item = *iter;
Instead of taking a reference to an existing object and leaving it at that, we have to do a type conversion. You are allowed to take a const reference to an object (or temporary) of a different type as long as there is an implicit conversion available, e.g.:
int const& i = 2.0; // perfectly OK
The type conversion is an allowed implicit conversion for the same reason you can convert a const Key
to a Key
, but we have to construct a temporary of the new type in order to allow for that. Thus, effectively our loop does:
std::pair<Key, Val> __tmp = *iter; // construct a temporary of the correct type
std::pair<Key, Val> const& item = __tmp; // then, take a reference to it
(Of course, there isn't actually a __tmp
object, it's just there for illustration, in reality the unnamed temporary is just bound to item
for its lifetime).
Just changing to:
for (auto const& item : m) {
// do stuff
}
just saved us a ton of copies - now the referenced type matches the initializer type, so no temporary or conversion is necessary, we can just do a direct reference.
Because auto
deduces the type of the initializing expression, there is no type conversion involved. Combined with templated algorithms, this means that you can get a more direct computation than if you were to make up a type yourself – especially when you are dealing with expressions whose type you cannot name!
A typical example comes from (ab)using std::function
:
std::function<bool(T, T)> cmp1 = std::bind(f, _2, 10, _1); // bad
auto cmp2 = std::bind(f, _2, 10, _1); // good
auto cmp3 = [](T a, T b){ return f(b, 10, a); }; // also good
std::stable_partition(begin(x), end(x), cmp?);
With cmp2
and cmp3
, the entire algorithm can inline the comparison call, whereas if you construct a std::function
object, not only can the call not be inlined, but you also have to go through the polymorphic lookup in the type-erased interior of the function wrapper.
Another variant on this theme is that you can say:
auto && f = MakeAThing();
This is always a reference, bound to the value of the function call expression, and never constructs any additional objects. If you didn't know the returned value's type, you might be forced to construct a new object (perhaps as a temporary) via something like T && f = MakeAThing()
. (Moreover, auto &&
even works when the return type is not movable and the return value is a prvalue.)