The cost of passing by shared_ptr
I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following:
class Dataset {...}
void f( shared_ptr< Dataset const > pds ) {...}
void g( shared_ptr< Dataset const > pds ) {...}
...
While passing a dataset object around via shared_ptr guarantees its existence inside f and g, the functions may be called millions of times, which causes a lot of shared_ptr objects being created and destroyed. Here's a snippet of the flat gprof profile from a recent run:
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 9.74 295.39 35.12 2451177304 0.00 0.00 std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&) 8.03 324.34 28.95 2451252116 0.00 0.00 std::tr1::__shared_count::~__shared_count()
So, ~17% of the runtime was spent on reference counting with shared_ptr objects. Is this normal?
A large portion of my application is single-threaded and I was thinking about re-writing some of the functions as
void f( const Dataset& ds ) {...}
and replacing the calls
shared_ptr< Dataset > pds( new Dataset(...) );
f( pds );
with
f( *pds );
in places where I know for sure the object will not get destroyed while the flow of the program is inside f(). But before I run off to change a bunch of function signatures / calls, I wanted to know what the typical performance hit of passing by shared_ptr was. Seems like shared_ptr should not be used for functions that get called very often.
Any input would be appreciated. Thanks for reading.
-Artem
Update: After changing a handful of functions to accept const Dataset&
, the new profile looks like this:
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 0.15 241.62 0.37 24981902 0.00 0.00 std::tr1::__shared_count::~__shared_count() 0.12 241.91 0.30 28342376 0.00 0.00 std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&)
I'm a little puzzled by the number of destructor calls being smaller than the number of copy constructor calls, but overall I'm very pleased with the decrease in the associated run-time. Thanks to all for their advice.
Always pass your shared_ptr
by const reference:
void f(const shared_ptr<Dataset const>& pds) {...}
void g(const shared_ptr<Dataset const>& pds) {...}
Edit: Regarding the safety issues mentioned by others:
- When using
shared_ptr
heavily throughout an application, passing by value will take up a tremendous amount of time (I've seen it go 50+%). - Use
const T&
instead ofconst shared_ptr<T const>&
when the argument shall not be null. - Using
const shared_ptr<T const>&
is safer thanconst T*
when performance is an issue.
You need shared_ptr only to pass it to functions/objects which keep it for future use. For example, some class may keep shared_ptr for using in an worker thread. For simple synchronous calls it's quite enough to use plain pointer or reference. shared_ptr should not replace using plain pointers completely.
If you're not using make_shared, could you give that a go? By locating the reference count and the object in the same area of memory you may see a performance gain associated with cache coherency. Worth a try anyway.