The cost of passing by shared_ptr

I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following:

class Dataset {...}

void f( shared_ptr< Dataset const > pds ) {...}
void g( shared_ptr< Dataset const > pds ) {...}
...

While passing a dataset object around via shared_ptr guarantees its existence inside f and g, the functions may be called millions of times, which causes a lot of shared_ptr objects being created and destroyed. Here's a snippet of the flat gprof profile from a recent run:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  9.74    295.39    35.12 2451177304     0.00     0.00  std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&)
  8.03    324.34    28.95 2451252116     0.00     0.00  std::tr1::__shared_count::~__shared_count()

So, ~17% of the runtime was spent on reference counting with shared_ptr objects. Is this normal?

A large portion of my application is single-threaded and I was thinking about re-writing some of the functions as

void f( const Dataset& ds ) {...}

and replacing the calls

shared_ptr< Dataset > pds( new Dataset(...) );
f( pds );

with

f( *pds );

in places where I know for sure the object will not get destroyed while the flow of the program is inside f(). But before I run off to change a bunch of function signatures / calls, I wanted to know what the typical performance hit of passing by shared_ptr was. Seems like shared_ptr should not be used for functions that get called very often.

Any input would be appreciated. Thanks for reading.

-Artem

Update: After changing a handful of functions to accept const Dataset&, the new profile looks like this:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  0.15    241.62     0.37 24981902     0.00     0.00  std::tr1::__shared_count::~__shared_count()
  0.12    241.91     0.30 28342376     0.00     0.00  std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&)

I'm a little puzzled by the number of destructor calls being smaller than the number of copy constructor calls, but overall I'm very pleased with the decrease in the associated run-time. Thanks to all for their advice.


Always pass your shared_ptr by const reference:

void f(const shared_ptr<Dataset const>& pds) {...} 
void g(const shared_ptr<Dataset const>& pds) {...} 

Edit: Regarding the safety issues mentioned by others:

  • When using shared_ptr heavily throughout an application, passing by value will take up a tremendous amount of time (I've seen it go 50+%).
  • Use const T& instead of const shared_ptr<T const>& when the argument shall not be null.
  • Using const shared_ptr<T const>& is safer than const T* when performance is an issue.

You need shared_ptr only to pass it to functions/objects which keep it for future use. For example, some class may keep shared_ptr for using in an worker thread. For simple synchronous calls it's quite enough to use plain pointer or reference. shared_ptr should not replace using plain pointers completely.


If you're not using make_shared, could you give that a go? By locating the reference count and the object in the same area of memory you may see a performance gain associated with cache coherency. Worth a try anyway.