C++ - passing references to std::shared_ptr or boost::shared_ptr

If I have a function that needs to work with a shared_ptr, wouldn't it be more efficient to pass it a reference to it (so to avoid copying the shared_ptr object)? What are the possible bad side effects? I envision two possible cases:

1) inside the function a copy is made of the argument, like in

ClassA::take_copy_of_sp(boost::shared_ptr<foo> &sp)  
{  
     ...  
     m_sp_member=sp; //This will copy the object, incrementing refcount  
     ...  
}  

2) inside the function the argument is only used, like in

Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here  
{    
    ...  
    sp->do_something();  
    ...  
}  

I can't see in both cases a good reason to pass the boost::shared_ptr<foo> by value instead of by reference. Passing by value would only "temporarily" increment the reference count due to the copying, and then decrement it when exiting the function scope. Am I overlooking something?

Just to clarify, after reading several answers: I perfectly agree on the premature-optimization concerns, and I always try to first-profile-then-work-on-the-hotspots. My question was more from a purely technical code-point-of-view, if you know what I mean.


Solution 1:

I found myself disagreeing with the highest-voted answer, so I went looking for expert opinons and here they are. From http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2011-Scott-Andrei-and-Herb-Ask-Us-Anything

Herb Sutter: "when you pass shared_ptrs, copies are expensive"

Scott Meyers: "There's nothing special about shared_ptr when it comes to whether you pass it by value, or pass it by reference. Use exactly the same analysis you use for any other user defined type. People seem to have this perception that shared_ptr somehow solves all management problems, and that because it's small, it's necessarily inexpensive to pass by value. It has to be copied, and there is a cost associated with that... it's expensive to pass it by value, so if I can get away with it with proper semantics in my program, I'm gonna pass it by reference to const or reference instead"

Herb Sutter: "always pass them by reference to const, and very occasionally maybe because you know what you called might modify the thing you got a reference from, maybe then you might pass by value... if you copy them as parameters, oh my goodness you almost never need to bump that reference count because it's being held alive anyway, and you should be passing it by reference, so please do that"

Update: Herb has expanded on this here: http://herbsutter.com/2013/06/05/gotw-91-solution-smart-pointer-parameters/, although the moral of the story is that you shouldn't be passing shared_ptrs at all "unless you want to use or manipulate the smart pointer itself, such as to share or transfer ownership."

Solution 2:

The point of a distinct shared_ptr instance is to guarantee (as far as possible) that as long as this shared_ptr is in scope, the object it points to will still exist, because its reference count will be at least 1.

Class::only_work_with_sp(boost::shared_ptr<foo> sp)
{
    // sp points to an object that cannot be destroyed during this function
}

So by using a reference to a shared_ptr, you disable that guarantee. So in your second case:

Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here  
{    
    ...  
    sp->do_something();  
    ...  
}

How do you know that sp->do_something() will not blow up due to a null pointer?

It all depends what is in those '...' sections of the code. What if you call something during the first '...' that has the side-effect (somewhere in another part of the code) of clearing a shared_ptr to that same object? And what if it happens to be the only remaining distinct shared_ptr to that object? Bye bye object, just where you're about to try and use it.

So there are two ways to answer that question:

  1. Examine the source of your entire program very carefully until you are sure the object won't die during the function body.

  2. Change the parameter back to be a distinct object instead of a reference.

General bit of advice that applies here: don't bother making risky changes to your code for the sake of performance until you've timed your product in a realistic situation in a profiler and conclusively measured that the change you want to make will make a significant difference to performance.

Update for commenter JQ

Here's a contrived example. It's deliberately simple, so the mistake will be obvious. In real examples, the mistake is not so obvious because it is hidden in layers of real detail.

We have a function that will send a message somewhere. It may be a large message so rather than using a std::string that likely gets copied as it is passed around to multiple places, we use a shared_ptr to a string:

void send_message(std::shared_ptr<std::string> msg)
{
    std::cout << (*msg.get()) << std::endl;
}

(We just "send" it to the console for this example).

Now we want to add a facility to remember the previous message. We want the following behaviour: a variable must exist that contains the most recently sent message, but while a message is currently being sent then there must be no previous message (the variable should be reset before sending). So we declare the new variable:

std::shared_ptr<std::string> previous_message;

Then we amend our function according to the rules we specified:

void send_message(std::shared_ptr<std::string> msg)
{
    previous_message = 0;
    std::cout << *msg << std::endl;
    previous_message = msg;
}

So, before we start sending we discard the current previous message, and then after the send is complete we can store the new previous message. All good. Here's some test code:

send_message(std::shared_ptr<std::string>(new std::string("Hi")));
send_message(previous_message);

And as expected, this prints Hi! twice.

Now along comes Mr Maintainer, who looks at the code and thinks: Hey, that parameter to send_message is a shared_ptr:

void send_message(std::shared_ptr<std::string> msg)

Obviously that can be changed to:

void send_message(const std::shared_ptr<std::string> &msg)

Think of the performance enhancement this will bring! (Never mind that we're about to send a typically large message over some channel, so the performance enhancement will be so small as to be unmeasureable).

But the real problem is that now the test code will exhibit undefined behaviour (in Visual C++ 2010 debug builds, it crashes).

Mr Maintainer is surprised by this, but adds a defensive check to send_message in an attempt to stop the problem happening:

void send_message(const std::shared_ptr<std::string> &msg)
{
    if (msg == 0)
        return;

But of course it still goes ahead and crashes, because msg is never null when send_message is called.

As I say, with all the code so close together in a trivial example, it's easy to find the mistake. But in real programs, with more complex relationships between mutable objects that hold pointers to each other, it is easy to make the mistake, and hard to construct the necessary test cases to detect the mistake.

The easy solution, where you want a function to be able to rely on a shared_ptr continuing to be non-null throughout, is for the function to allocate its own true shared_ptr, rather than relying on a reference to an existing shared_ptr.

The downside is that copied a shared_ptr is not free: even "lock-free" implementations have to use an interlocked operation to honour threading guarantees. So there may be situations where a program can be significantly sped up by changing a shared_ptr into a shared_ptr &. But it this is not a change that can be safely made to all programs. It changes the logical meaning of the program.

Note that a similar bug would occur if we used std::string throughout instead of std::shared_ptr<std::string>, and instead of:

previous_message = 0;

to clear the message, we said:

previous_message.clear();

Then the symptom would be the accidental sending of an empty message, instead of undefined behaviour. The cost of an extra copy of a very large string may be a lot more significant than the cost of copying a shared_ptr, so the trade-off may be different.

Solution 3:

I would advise against this practice unless you and the other programmers you work with really, really know what you are all doing.

First, you have no idea how the interface to your class might evolve and you want to prevent other programmers from doing bad things. Passing a shared_ptr by reference isn't something a programmer should expect to see, because it isn't idiomatic, and that makes it easy to use it incorrectly. Program defensively: make the interface hard to use incorrectly. Passing by reference is just going to invite problems later on.

Second, don't optimize until you know this particular class is going to be a problem. Profile first, and then if your program really needs the boost given by passing by reference, then maybe. Otherwise, don't sweat the small stuff (i.e. the extra N instructions it takes to pass by value) instead worry about design, data structures, algorithms, and long-term maintainability.

Solution 4:

Yes, taking a reference is fine there. You don't intend to give the method shared ownership; it only wants to work with it. You could take a reference for the first case too, since you copy it anyway. But for first case, it takes ownership. There is this trick to still copy it only once:

void ClassA::take_copy_of_sp(boost::shared_ptr<foo> sp) {
    m_sp_member.swap(sp);
}

You should also copy when you return it (i.e not return a reference). Because your class doesn't know what the client is doing with it (it could store a pointer to it and then big bang happens). If it later turns out it's a bottleneck (first profile!), then you can still return a reference.


Edit: Of course, as others point out, this only is true if you know your code and know that you don't reset the passed shared pointer in some way. If in doubt, just pass by value.