Performance cost of passing by value vs. by reference or by pointer?

Let's consider an object foo (which may be an int, a double, a custom struct, a class, whatever). My understanding is that passing foo by reference to a function (or just passing a pointer to foo) leads to higher performance since we avoid making a local copy (which could be expensive if foo is large).

However, from the answer here it seems that pointers on a 64-bit system can be expected in practice to have a size of 8 bytes, regardless of what's being pointed. On my system, a float is 4 bytes. Does that mean that if foo is of type float, then it is more efficient to just pass foo by value rather than give a pointer to it (assuming no other constraints that would make using one more efficient than the other inside the function)?


Solution 1:

It depends on what you mean by "cost", and properties of the host system (hardware, operating system) with respect to operations.

If your cost measure is memory usage, then the calculation of cost is obvious - add up the sizes of whatever is being copied.

If your measure is execution speed (or "efficiency") then the game is different. Hardware (and operating systems and compiler) tend to be optimised for performance of operations on copying things of particular sizes, by virtue of dedicated circuits (machine registers, and how they are used).

It is common, for example, for a machine to have an architecture (machine registers, memory architecture, etc) which result in a "sweet spot" - copying variables of some size is most "efficient", but copying larger OR SMALLER variables is less so. Larger variables will cost more to copy, because there may be a need to do multiple copies of smaller chunks. Smaller ones may also cost more, because the compiler needs to copy the smaller value into a larger variable (or register), do the operations on it, then copy the value back.

Examples with floating point include some cray supercomputers, which natively support double precision floating point (aka double in C++), and all operations on single precision (aka float in C++) are emulated in software. Some older 32-bit x86 CPUs also worked internally with 32-bit integers, and operations on 16-bit integers required more clock cycles due to translation to/from 32-bit (this is not true with more modern 32-bit or 64-bit x86 processors, as they allow copying 16-bit integers to/from 32-bit registers, and operating on them, with fewer such penalties).

It is a bit of a no-brainer that copying a very large structure by value will be less efficient than creating and copying its address. But, because of factors like the above, the cross-over point between "best to copy something of that size by value" and "best to pass its address" is less clear.

Pointers and references tend to be implemented in a similar manner (e.g. pass by reference can be implemented in the same way as passing a pointer) but that is not guaranteed.

The only way to be sure is to measure it. And realise that the measurements will vary between systems.

Solution 2:

There is one thing nobody mentioned.

There is a certain GCC optimization called IPA SRA, that replaces "pass by reference" with "pass by value" automatically: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (-fipa-sra)

This is most likely done for scalar types (eg. int, double, etc), that does not have non-default copy semantics and can fit into cpu registers.

This makes

void(const int &f)

probably as fast (and space optimized)

void(int f)

So with this optimization enabled, using references for small types should be as fast as passing them by value.

On the other hand passing (for example) std::string by value could not be optimized to by-reference speed, as custom copy semantics are being involved.

From what I understand, using pass by reference for everything should never be slower than manually picking what to pass by value and what to pass by reference.

This is extremely useful especially for templates:

template<class T>
void f(const T&)
{
    // Something
}

is always optimal

Solution 3:

You must test any given scenario where performance is absolutely critical, but be very careful about trying to force the compiler to generate code in a specific way.

The compiler's optimizer is allowed to re-write your code in any way it chooses as long as the final result is the provably same, which can lead to some very nice optimizations.

Consider that passing a float by value requires making a copy of the float, but under the right conditions, passing a float by reference could allow storing the original float in a CPU floating-point register, and treat that register as the "reference" parameter to the function. By contrast, if you pass a copy, the compiler has to find a place to store the copy in order to preserve the contents of the register, or even worse, it may not be able to use a register at all because of the need for preserving the original (this is especially true in recursive functions!).

This difference is also important if you are passing the reference to a function that could be inlined, where the reference may reduce the cost of inlining since the compiler doesn't have to guarantee that a copied parameter cannot modify the original.

The more a language allows you to focus on describing what you want done rather than how you want it done, the more the compiler is able to find creative ways of doing the hard work for you. In C++ especially, it is generally best not to worry about performance, and instead focus on describing what you want as clearly and simply as possible. By trying to describe how you want the work done, you will just as often prevent the compiler from doing its job of optimizing your code for you.