Can the use of C++11's 'auto' improve performance?

I can see why the auto type in C++11 improves correctness and maintainability. I've read that it can also improve performance (Almost Always Auto by Herb Sutter), but I miss a good explanation.

  • How can auto improve performance?
  • Can anyone give an example?

auto can aid performance by avoiding silent implicit conversions. An example I find compelling is the following.

std::map<Key, Val> m;
// ...

for (std::pair<Key, Val> const& item : m) {
    // do stuff
}

See the bug? Here we are, thinking we're elegantly taking every item in the map by const reference and using the new range-for expression to make our intent clear, but actually we're copying every element. This is because std::map<Key, Val>::value_type is std::pair<const Key, Val>, not std::pair<Key, Val>. Thus, when we (implicitly) have:

std::pair<Key, Val> const& item = *iter;

Instead of taking a reference to an existing object and leaving it at that, we have to do a type conversion. You are allowed to take a const reference to an object (or temporary) of a different type as long as there is an implicit conversion available, e.g.:

int const& i = 2.0; // perfectly OK

The type conversion is an allowed implicit conversion for the same reason you can convert a const Key to a Key, but we have to construct a temporary of the new type in order to allow for that. Thus, effectively our loop does:

std::pair<Key, Val> __tmp = *iter;       // construct a temporary of the correct type
std::pair<Key, Val> const& item = __tmp; // then, take a reference to it

(Of course, there isn't actually a __tmp object, it's just there for illustration, in reality the unnamed temporary is just bound to item for its lifetime).

Just changing to:

for (auto const& item : m) {
    // do stuff
}

just saved us a ton of copies - now the referenced type matches the initializer type, so no temporary or conversion is necessary, we can just do a direct reference.


Because auto deduces the type of the initializing expression, there is no type conversion involved. Combined with templated algorithms, this means that you can get a more direct computation than if you were to make up a type yourself – especially when you are dealing with expressions whose type you cannot name!

A typical example comes from (ab)using std::function:

std::function<bool(T, T)> cmp1 = std::bind(f, _2, 10, _1);  // bad
auto cmp2 = std::bind(f, _2, 10, _1);                       // good
auto cmp3 = [](T a, T b){ return f(b, 10, a); };            // also good

std::stable_partition(begin(x), end(x), cmp?);

With cmp2 and cmp3, the entire algorithm can inline the comparison call, whereas if you construct a std::function object, not only can the call not be inlined, but you also have to go through the polymorphic lookup in the type-erased interior of the function wrapper.

Another variant on this theme is that you can say:

auto && f = MakeAThing();

This is always a reference, bound to the value of the function call expression, and never constructs any additional objects. If you didn't know the returned value's type, you might be forced to construct a new object (perhaps as a temporary) via something like T && f = MakeAThing(). (Moreover, auto && even works when the return type is not movable and the return value is a prvalue.)