When is overloading pass by reference (l-value and r-value) preferred to pass-by-value?

For types whose copy assignment operator can recycle resources, swapping with a copy is almost never the best way to implement the copy assignment operator. For example look at std::vector:

This class manages a dynamically sized buffer and maintains both a capacity (maximum length the buffer can hold), and a size (the current length). If the vector copy assignment operator is implemented swap, then no matter what, a new buffer is always allocated if the rhs.size() != 0.

However, if lhs.capacity() >= rhs.size(), no new buffer need be allocated at all. One can simply assign/construct the elements from rhs to lhs. When the element type is trivially copyable, this may boil down to nothing but memcpy. This can be much, much faster than allocating and deallocating a buffer.

Same issue for std::string.

Same issue for MyType when MyType has data members that are std::vector and/or std::string.

There are only 2 times you want to consider implementing copy assignment with swap:

  1. You know that the swap method (including the obligatory copy construction when the rhs is an lvalue) will not be terribly inefficient.

  2. You know that you will always need the copy assignment operator to have the strong exception safety guarantee.

If you're not sure about 2, in other words you think the copy assignment operator might sometimes need the strong exception safety guarantee, don't implement assignment in terms of swap. It is easy for your clients to achieve the same guarantee if you provide one of:

  1. A noexcept swap.
  2. A noexcept move assignment operator.

For example:

template <class T>
T&
strong_assign(T& x, T y)
{
    using std::swap;
    swap(x, y);
    return x;
}

or:

template <class T>
T&
strong_assign(T& x, T y)
{
    x = std::move(y);
    return x;
}

Now there will be some types where implementing copy assignment with swap will make sense. However these types will be the exception, not the rule.

On:

void push_back(const value_type& val);
void push_back(value_type&& val);

Imagine vector<big_legacy_type> where:

class big_legacy_type
{
 public:
      big_legacy_type(const big_legacy_type&);  // expensive
      // no move members ...
};

If we had only:

void push_back(value_type val);

Then push_backing an lvalue big_legacy_type into a vector would require 2 copies instead of 1, even when capacity was sufficient. That would be a disaster, performance wise.

Update

Here is a HelloWorld that you should be able to run on any C++11 conforming platform:

#include <vector>
#include <random>
#include <chrono>
#include <iostream>

class X
{
    std::vector<int> v_;
public:
    explicit X(unsigned s) : v_(s) {}

#if SLOW_DOWN
    X(const X&) = default;
    X(X&&) = default;
    X& operator=(X x)
    {
        v_.swap(x.v_);
        return *this;
    }
#endif
};

std::mt19937_64 eng;
std::uniform_int_distribution<unsigned> size(0, 1000);

std::chrono::high_resolution_clock::duration
test(X& x, const X& y)
{
    auto t0 = std::chrono::high_resolution_clock::now();
    x = y;
    auto t1 = std::chrono::high_resolution_clock::now();
    return t1-t0;
}

int
main()
{
    const int N = 1000000;
    typedef std::chrono::duration<double, std::nano> nano;
    nano ns(0);
    for (int i = 0; i < N; ++i)
    {
        X x1(size(eng));
        X x2(size(eng));
        ns += test(x1, x2);
    }
    ns /= N;
    std::cout << ns.count() << "ns\n";
}

I've coded X's copy assignment operator two ways:

  1. Implicitly, which is equivalent to calling vector's copy assignment operator.
  2. With the copy/swap idiom, suggestively under the macro SLOW_DOWN. I thought about naming it SLEEP_FOR_AWHILE, but this way is actually much worse than sleep statements if you're on a battery powered device.

The test constructs some randomly sized vector<int>s between 0 and 1000, and assigns them a million times. It times each one, sums the times, and then finds the average time in floating point nanoseconds and prints that out. If two consecutive calls to your high resolution clock doesn't return something less than 100 nanoseconds, you may want to raise the length of the vectors.

Here are my results:

$ clang++ -std=c++11 -stdlib=libc++ -O3 test.cpp
$ a.out
428.348ns
$ a.out
438.5ns
$ a.out
431.465ns
$ clang++ -std=c++11 -stdlib=libc++ -O3 -DSLOW_DOWN test.cpp
$ a.out
617.045ns
$ a.out
616.964ns
$ a.out
618.808ns

I'm seeing a 43% performance hit for the copy/swap idiom with this simple test. YMMV.

The above test, on average, has sufficient capacity on the lhs half the time. If we take this to either extreme:

  1. lhs has sufficient capacity all of the time.
  2. lhs has sufficient capacity none of the time.

then the performance advantage of the default copy assignment over the copy/swap idiom varies from about 560% to 0%. The copy/swap idiom is never faster, and can be dramatically slower (for this test).

Want Speed? Measure.