Pass by value faster than pass by reference
I made a simple program in c++ to compare performance between two approaches - pass by value and pass by reference. Actually pass by value performed better than pass by reference.
The conclusion should be that passing by value require fewer clock-cycles (instructions)
I would be really glad if someone could explain in detail why pass by value require fewer clock-cycles.
#include <iostream>
#include <stdlib.h>
#include <time.h>
using namespace std;
void function(int *ptr);
void function2(int val);
int main() {
int nmbr = 5;
clock_t start, stop;
start = clock();
for (long i = 0; i < 1000000000; i++) {
function(&nmbr);
//function2(nmbr);
}
stop = clock();
cout << "time: " << stop - start;
return 0;
}
/**
* pass by reference
*/
void function(int *ptr) {
*ptr *= 5;
}
/**
* pass by value
*/
void function2(int val) {
val *= 5;
}
Solution 1:
A good way to find out why there are any differences is to check the disassembly. Here are the results I got on my machine with Visual Studio 2012.
With optimization flags, both functions generate the same code:
009D1270 57 push edi
009D1271 FF 15 D4 30 9D 00 call dword ptr ds:[9D30D4h]
009D1277 8B F8 mov edi,eax
009D1279 FF 15 D4 30 9D 00 call dword ptr ds:[9D30D4h]
009D127F 8B 0D 48 30 9D 00 mov ecx,dword ptr ds:[9D3048h]
009D1285 2B C7 sub eax,edi
009D1287 50 push eax
009D1288 E8 A3 04 00 00 call std::operator<<<std::char_traits<char> > (09D1730h)
009D128D 8B C8 mov ecx,eax
009D128F FF 15 2C 30 9D 00 call dword ptr ds:[9D302Ch]
009D1295 33 C0 xor eax,eax
009D1297 5F pop edi
009D1298 C3 ret
This is basically equivalent to:
int main ()
{
clock_t start, stop ;
start = clock () ;
stop = clock () ;
cout << "time: " << stop - start ;
return 0 ;
}
Without optimization flags, you will probably get different results.
function (no optimizations):
00114890 55 push ebp
00114891 8B EC mov ebp,esp
00114893 81 EC C0 00 00 00 sub esp,0C0h
00114899 53 push ebx
0011489A 56 push esi
0011489B 57 push edi
0011489C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
001148A2 B9 30 00 00 00 mov ecx,30h
001148A7 B8 CC CC CC CC mov eax,0CCCCCCCCh
001148AC F3 AB rep stos dword ptr es:[edi]
001148AE 8B 45 08 mov eax,dword ptr [ptr]
001148B1 8B 08 mov ecx,dword ptr [eax]
001148B3 6B C9 05 imul ecx,ecx,5
001148B6 8B 55 08 mov edx,dword ptr [ptr]
001148B9 89 0A mov dword ptr [edx],ecx
001148BB 5F pop edi
001148BC 5E pop esi
001148BD 5B pop ebx
001148BE 8B E5 mov esp,ebp
001148C0 5D pop ebp
001148C1 C3 ret
function2 (no optimizations)
00FF4850 55 push ebp
00FF4851 8B EC mov ebp,esp
00FF4853 81 EC C0 00 00 00 sub esp,0C0h
00FF4859 53 push ebx
00FF485A 56 push esi
00FF485B 57 push edi
00FF485C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
00FF4862 B9 30 00 00 00 mov ecx,30h
00FF4867 B8 CC CC CC CC mov eax,0CCCCCCCCh
00FF486C F3 AB rep stos dword ptr es:[edi]
00FF486E 8B 45 08 mov eax,dword ptr [val]
00FF4871 6B C0 05 imul eax,eax,5
00FF4874 89 45 08 mov dword ptr [val],eax
00FF4877 5F pop edi
00FF4878 5E pop esi
00FF4879 5B pop ebx
00FF487A 8B E5 mov esp,ebp
00FF487C 5D pop ebp
00FF487D C3 ret
Why is pass by value faster (in the no optimization case)?
Well, function()
has two extra mov
operations. Let's take a look at the first extra mov
operation:
001148AE 8B 45 08 mov eax,dword ptr [ptr]
001148B1 8B 08 mov ecx,dword ptr [eax]
001148B3 6B C9 05 imul ecx,ecx,5
Here we are dereferencing the pointer. In function2 ()
, we already have the value, so we avoid this step. We first move the address of the pointer into register eax. Then we move the value of the pointer into register ecx. Finally, we multiply the value by five.
Let's look at the second extra mov
operation:
001148B3 6B C9 05 imul ecx,ecx,5
001148B6 8B 55 08 mov edx,dword ptr [ptr]
001148B9 89 0A mov dword ptr [edx],ecx
Now we are moving backwards. We have just finished multiplying the value by 5, and we need to place the value back into the memory address.
Because function2 ()
does not have to deal with referencing and dereferencing a pointer, it gets to skip these two extra mov
operations.
Solution 2:
Overhead with passing by reference:
- each access needs a dereference, i.e., there is one more memory read
Overhead with passing by value:
- the value needs to be copied on stack or into registers
For small objects, such as an integer, passing by value will be faster. For bigger objects (for example a large structure), the copying would create too much overhead so passing by reference will be faster.
Solution 3:
Imagine you walk into a function and you're supposed to come in with an int value. The code in the function wants to do stuff with that int value.
Pass by value is like walking into the function and when someone asks for the int foo value, you just give it to them.
Pass by reference is walking into the function with the address of the int foo value. Now whenever someone needs the value of foo they have to go and look it up. Everyone's gonna complain about having to dereference foo all the freaking time. I've been in this function for 2 milliseconds now and I must have looked up foo a thousand times! Why didn't you just give me the value in the first place? Why didn't you pass by value?
This analogy helped me see why passing by value is often the fastest choice.
Solution 4:
To some reasoning: In most popular machines, an integer is 32bits, and a pointer is 32 or 64bits
So you have to pass that much information.
To multiply an integer you have to:
Multiply it.
To multiply an integer pointed by a pointer you have to:
Deference the pointer. Multiply it.
Hope it's clear enough :)
Now to some more specific stuff:
As it's been pointed out, your by-value function does nothing with the result, but the by-pointer one actually saves the result in memory. Why you so unfair with poor pointer? :( (just kidding)
It's hard to say how valid your benchmark is, since compilers come packed with all kind of optimization. (of course you can control the compiler freedom, but you haven't provided info on that)
And finally (and probably most important), pointers, values or references does not have an speed associated to it. Who knows, you may find a machine that is faster with pointers and take a hard time with values, or the opposite. Okay, okay, there is some pattern in hardware and we make all this assumptions, the most widely accepted seems to be:
Pass simple objects by value and more complex ones by reference (or pointer) (but then again, what's complex? What's simple? It changes with time as hardware follows)
So recently I sense the standard opinion is becoming: pass by value and trust the compiler. And that's cool. Compilers are backed up with years of expertise development and angry users demanding it to be always better.
Solution 5:
When you pass by value, you are telling the compiler to make a copy of the entity you are passing by value.
When you are passing by reference, you are telling the compiler that it must use the actual memory that the reference is pointing to. The compiler does not know if you are doing this in an attempt to optimize, or because the referenced value might be changing in some other thread (for example). It has to use that area of memory.
Passing by reference means the processor has to access that specific memory block. That may or may not be the most efficient process, depending on the state of the registers. When you pass by reference, the memory on the stack can be used, which increases the chance of accessing cache (much faster) memory.
Finally, depending on the architecture of your machine and the type you are passing, a reference may actually be larger than the value you are copying. Copying a 32 bit integer involves copying less than passing a reference on a 64 bit machine.
So passing by reference should only be done when you need a reference (to mutate the value, or because the value might be mutated elsewhere), or when copying the referenced object is more expensive than dereferencing the necessary memory.
While that last point is non-trivial, a good rule of thumb is to do what Java does: pass fundamental types by value, and complex types by (const) reference.