How do I pass a unique_ptr argument to a constructor or a function?

I'm new to move semantics in C++11 and I don't know very well how to handle unique_ptr parameters in constructors or functions. Consider this class referencing itself:

#include <memory>

class Base
{
  public:

    typedef unique_ptr<Base> UPtr;

    Base(){}
    Base(Base::UPtr n):next(std::move(n)){}

    virtual ~Base(){}

    void setNext(Base::UPtr n)
    {
      next = std::move(n);
    }

  protected :

    Base::UPtr next;

};

Is this how I should write functions taking unique_ptr arguments?

And do I need to use std::move in the calling code?

Base::UPtr b1;
Base::UPtr b2(new Base());

b1->setNext(b2); //should I write b1->setNext(std::move(b2)); instead?

Solution 1:

Here are the possible ways to take a unique pointer as an argument, as well as their associated meaning.

(A) By Value

Base(std::unique_ptr<Base> n)
  : next(std::move(n)) {}

In order for the user to call this, they must do one of the following:

Base newBase(std::move(nextBase));
Base fromTemp(std::unique_ptr<Base>(new Base(...));

To take a unique pointer by value means that you are transferring ownership of the pointer to the function/object/etc in question. After newBase is constructed, nextBase is guaranteed to be empty. You don't own the object, and you don't even have a pointer to it anymore. It's gone.

This is ensured because we take the parameter by value. std::move doesn't actually move anything; it's just a fancy cast. std::move(nextBase) returns a Base&& that is an r-value reference to nextBase. That's all it does.

Because Base::Base(std::unique_ptr<Base> n) takes its argument by value rather than by r-value reference, C++ will automatically construct a temporary for us. It creates a std::unique_ptr<Base> from the Base&& that we gave the function via std::move(nextBase). It is the construction of this temporary that actually moves the value from nextBase into the function argument n.

(B) By non-const l-value reference

Base(std::unique_ptr<Base> &n)
  : next(std::move(n)) {}

This has to be called on an actual l-value (a named variable). It cannot be called with a temporary like this:

Base newBase(std::unique_ptr<Base>(new Base)); //Illegal in this case.

The meaning of this is the same as the meaning of any other use of non-const references: the function may or may not claim ownership of the pointer. Given this code:

Base newBase(nextBase);

There is no guarantee that nextBase is empty. It may be empty; it may not. It really depends on what Base::Base(std::unique_ptr<Base> &n) wants to do. Because of that, it's not very evident just from the function signature what's going to happen; you have to read the implementation (or associated documentation).

Because of that, I wouldn't suggest this as an interface.

(C) By const l-value reference

Base(std::unique_ptr<Base> const &n);

I don't show an implementation, because you cannot move from a const&. By passing a const&, you are saying that the function can access the Base via the pointer, but it cannot store it anywhere. It cannot claim ownership of it.

This can be useful. Not necessarily for your specific case, but it's always good to be able to hand someone a pointer and know that they cannot (without breaking rules of C++, like no casting away const) claim ownership of it. They can't store it. They can pass it to others, but those others have to abide by the same rules.

(D) By r-value reference

Base(std::unique_ptr<Base> &&n)
  : next(std::move(n)) {}

This is more or less identical to the "by non-const l-value reference" case. The differences are two things.

  1. You can pass a temporary:

    Base newBase(std::unique_ptr<Base>(new Base)); //legal now..
    
  2. You must use std::move when passing non-temporary arguments.

The latter is really the problem. If you see this line:

Base newBase(std::move(nextBase));

You have a reasonable expectation that, after this line completes, nextBase should be empty. It should have been moved from. After all, you have that std::move sitting there, telling you that movement has occurred.

The problem is that it hasn't. It is not guaranteed to have been moved from. It may have been moved from, but you will only know by looking at the source code. You cannot tell just from the function signature.

Recommendations

  • (A) By Value: If you mean for a function to claim ownership of a unique_ptr, take it by value.
  • (C) By const l-value reference: If you mean for a function to simply use the unique_ptr for the duration of that function's execution, take it by const&. Alternatively, pass a & or const& to the actual type pointed to, rather than using a unique_ptr.
  • (D) By r-value reference: If a function may or may not claim ownership (depending on internal code paths), then take it by &&. But I strongly advise against doing this whenever possible.

How to manipulate unique_ptr

You cannot copy a unique_ptr. You can only move it. The proper way to do this is with the std::move standard library function.

If you take a unique_ptr by value, you can move from it freely. But movement doesn't actually happen because of std::move. Take the following statement:

std::unique_ptr<Base> newPtr(std::move(oldPtr));

This is really two statements:

std::unique_ptr<Base> &&temporary = std::move(oldPtr);
std::unique_ptr<Base> newPtr(temporary);

(note: The above code does not technically compile, since non-temporary r-value references are not actually r-values. It is here for demo purposes only).

The temporary is just an r-value reference to oldPtr. It is in the constructor of newPtr where the movement happens. unique_ptr's move constructor (a constructor that takes a && to itself) is what does the actual movement.

If you have a unique_ptr value and you want to store it somewhere, you must use std::move to do the storage.

Solution 2:

Let me try to state the different viable modes of passing pointers around to objects whose memory is managed by an instance of the std::unique_ptr class template; it also applies to the the older std::auto_ptr class template (which I believe allows all uses that unique pointer does, but for which in addition modifiable lvalues will be accepted where rvalues are expected, without having to invoke std::move), and to some extent also to std::shared_ptr.

As a concrete example for the discussion I will consider the following simple list type

struct node;
typedef std::unique_ptr<node> list;
struct node { int entry; list next; }

Instances of such list (which cannot be allowed to share parts with other instances or be circular) are entirely owned by whoever holds the initial list pointer. If client code knows that the list it stores will never be empty, it may also choose to store the first node directly rather than a list. No destructor for node needs to be defined: since the destructors for its fields are automatically called, the whole list will be recursively deleted by the smart pointer destructor once the lifetime of initial pointer or node ends.

This recursive type gives the occasion to discuss some cases that are less visible in the case of a smart pointer to plain data. Also the functions themselves occasionally provide (recursively) an example of client code as well. The typedef for list is of course biased towards unique_ptr, but the definition could be changed to use auto_ptr or shared_ptr instead without much need to change to what is said below (notably concerning exception safety being assured without the need to write destructors).

Modes of passing smart pointers around

Mode 0: pass a pointer or reference argument instead of a smart pointer

If your function is not concerned with ownership, this is the preferred method: don't make it take a smart pointer at all. In this case your function does not need to worry who owns the object pointed to, or by what means that ownership is managed, so passing a raw pointer is both perfectly safe, and the most flexible form, since regardless of ownership a client can always produce a raw pointer (either by calling the get method or from the address-of operator &).

For instance the function to compute the length of such list, should not be give a list argument, but a raw pointer:

size_t length(const node* p)
{ size_t l=0; for ( ; p!=nullptr; p=p->next.get()) ++l; return l; }

A client that holds a variable list head can call this function as length(head.get()), while a client that has chosen instead to store a node n representing a non-empty list can call length(&n).

If the pointer is guaranteed to be non null (which is not the case here since lists may be empty) one might prefer to pass a reference rather than a pointer. It might be a pointer/reference to non-const if the function needs to update the contents of the node(s), without adding or removing any of them (the latter would involve ownership).

An interesting case that falls in the mode 0 category is making a (deep) copy of the list; while a function doing this must of course transfer ownership of the copy it creates, it is not concerned with the ownership of the list it is copying. So it could be defined as follows:

list copy(const node* p)
{ return list( p==nullptr ? nullptr : new node{p->entry,copy(p->next.get())} ); }

This code merits a close look, both for the question as to why it compiles at all (the result of the recursive call to copy in the initialiser list binds to the rvalue reference argument in the move constructor of unique_ptr<node>, a.k.a. list, when initialising the next field of the generated node), and for the question as to why it is exception-safe (if during the recursive allocation process memory runs out and some call of new throws std::bad_alloc, then at that time a pointer to the partly constructed list is held anonymously in a temporary of type list created for the initialiser list, and its destructor will clean up that partial list). By the way one should resist the temptation to replace (as I initially did) the second nullptr by p, which after all is known to be null at that point: one cannot construct a smart pointer from a (raw) pointer to constant, even when it is known to be null.

Mode 1: pass a smart pointer by value

A function that takes a smart pointer value as argument takes possession of the object pointed to right away: the smart pointer that the caller held (whether in a named variable or an anonymous temporary) is copied into the argument value at function entrance and the caller's pointer has become null (in the case of a temporary the copy might have been elided, but in any case the caller has lost access to the pointed to object). I would like to call this mode call by cash: caller pays up front for the service called, and can have no illusions about ownership after the call. To make this clear, the language rules require the caller to wrap the argument in std::move if the smart pointer is held in a variable (technically, if the argument is an lvalue); in this case (but not for mode 3 below) this function does what its name suggests, namely move the value from the variable to a temporary, leaving the variable null.

For cases where the called function unconditionally takes ownership of (pilfers) the pointed-to object, this mode used with std::unique_ptr or std::auto_ptr is a good way of passing a pointer together with its ownership, which avoids any risk of memory leaks. Nonetheless I think that there are only very few situations where mode 3 below is not to be preferred (ever so slightly) over mode 1. For this reason I shall provide no usage examples of this mode. (But see the reversed example of mode 3 below, where it is remarked that mode 1 would do at least as well.) If the function takes more arguments than just this pointer, it may happen that there is in addition a technical reason to avoid mode 1 (with std::unique_ptr or std::auto_ptr): since an actual move operation takes place while passing a pointer variable p by the expression std::move(p), it cannot be assumed that p holds a useful value while evaluating the other arguments (the order of evaluation being unspecified), which could lead to subtle errors; by contrast, using mode 3 assures that no move from p takes place before the function call, so other arguments can safely access a value through p.

When used with std::shared_ptr, this mode is interesting in that with a single function definition it allows the caller to choose whether to keep a sharing copy of the pointer for itself while creating a new sharing copy to be used by the function (this happens when an lvalue argument is provided; the copy constructor for shared pointers used at the call increases the reference count), or to just give the function a copy of the pointer without retaining one or touching the reference count (this happens when a rvalue argument is provided, possibly an lvalue wrapped in a call of std::move). For instance

void f(std::shared_ptr<X> x) // call by shared cash
{ container.insert(std::move(x)); } // store shared pointer in container

void client()
{ std::shared_ptr<X> p = std::make_shared<X>(args);
  f(p); // lvalue argument; store pointer in container but keep a copy
  f(std::make_shared<X>(args)); // prvalue argument; fresh pointer is just stored away
  f(std::move(p)); // xvalue argument; p is transferred to container and left null
}

The same could be achieved by separately defining void f(const std::shared_ptr<X>& x) (for the lvalue case) and void f(std::shared_ptr<X>&& x) (for the rvalue case), with function bodies differing only in that the first version invokes copy semantics (using copy construction/assignment when using x) but the second version move semantics (writing std::move(x) instead, as in the example code). So for shared pointers, mode 1 can be useful to avoid some code duplication.

Mode 2: pass a smart pointer by (modifiable) lvalue reference

Here the function just requires having a modifiable reference to the smart pointer, but gives no indication of what it will do with it. I would like to call this method call by card: caller ensures payment by giving a credit card number. The reference can be used to take ownership of the pointed-to object, but it does not have to. This mode requires providing a modifiable lvalue argument, corresponding to the fact that the desired effect of the function may include leaving a useful value in the argument variable. A caller with an rvalue expression that it wishes to pass to such a function would be forced to store it in a named variable to be able to make the call, since the language only provides implicit conversion to a constant lvalue reference (referring to a temporary) from an rvalue. (Unlike the opposite situation handled by std::move, a cast from Y&& to Y&, with Y the smart pointer type, is not possible; nonetheless this conversion could be obtained by a simple template function if really desired; see https://stackoverflow.com/a/24868376/1436796). For the case where the called function intends to unconditionally take ownership of the object, stealing from the argument, the obligation to provide an lvalue argument is giving the wrong signal: the variable will have no useful value after the call. Therefore mode 3, which gives identical possibilities inside our function but asks callers to provide an rvalue, should be preferred for such usage.

However there is a valid use case for mode 2, namely functions that may modify the pointer, or the object pointed to in a way that involves ownership. For instance, a function that prefixes a node to a list provides an example of such use:

void prepend (int x, list& l) { l = list( new node{ x, std::move(l)} ); }

Clearly it would be undesirable here to force callers to use std::move, since their smart pointer still owns a well defined and non-empty list after the call, though a different one than before.

Again it is interesting to observe what happens if the prepend call fails for lack of free memory. Then the new call will throw std::bad_alloc; at this point in time, since no node could be allocated, it is certain that the passed rvalue reference (mode 3) from std::move(l) cannot yet have been pilfered, as that would be done to construct the next field of the node that failed to be allocated. So the original smart pointer l still holds the original list when the error is thrown; that list will either be properly destroyed by the smart pointer destructor, or in case l should survive thanks to a sufficiently early catch clause, it will still hold the original list.

That was a constructive example; with a wink to this question one can also give the more destructive example of removing the first node containing a given value, if any:

void remove_first(int x, list& l)
{ list* p = &l;
  while ((*p).get()!=nullptr and (*p)->entry!=x)
    p = &(*p)->next;
  if ((*p).get()!=nullptr)
    (*p).reset((*p)->next.release()); // or equivalent: *p = std::move((*p)->next); 
}

Again the correctness is quite subtle here. Notably, in the final statement the pointer (*p)->next held inside the node to be removed is unlinked (by release, which returns the pointer but makes the original null) before reset (implicitly) destroys that node (when it destroys the old value held by p), ensuring that one and only one node is destroyed at that time. (In the alternative form mentioned in the comment, this timing would be left to the internals of the implementation of the move-assignment operator of the std::unique_ptr instance list; the standard says 20.7.1.2.3;2 that this operator should act "as if by calling reset(u.release())", whence the timing should be safe here too.)

Note that prepend and remove_first cannot be called by clients who store a local node variable for an always non-empty list, and rightly so since the implementations given could not work for such cases.

Mode 3: pass a smart pointer by (modifiable) rvalue reference

This is the preferred mode to use when simply taking ownership of the pointer. I would like to call this method call by check: caller must accept relinquishing ownership, as if providing cash, by signing the check, but the actual withdrawal is postponed until the called function actually pilfers the pointer (exactly as it would when using mode 2). The "signing of the check" concretely means callers have to wrap an argument in std::move (as in mode 1) if it is an lvalue (if it is an rvalue, the "giving up ownership" part is obvious and requires no separate code).

Note that technically mode 3 behaves exactly as mode 2, so the called function does not have to assume ownership; however I would insist that if there is any uncertainty about ownership transfer (in normal usage), mode 2 should be preferred to mode 3, so that using mode 3 is implicitly a signal to callers that they are giving up ownership. One might retort that only mode 1 argument passing really signals forced loss of ownership to callers. But if a client has any doubts about intentions of the called function, she is supposed to know the specifications of the function being called, which should remove any doubt.

It is surprisingly difficult to find a typical example involving our list type that uses mode 3 argument passing. Moving a list b to the end of another list a is a typical example; however a (which survives and holds the result of the operation) is better passed using mode 2:

void append (list& a, list&& b)
{ list* p=&a;
  while ((*p).get()!=nullptr) // find end of list a
    p=&(*p)->next;
  *p = std::move(b); // attach b; the variable b relinquishes ownership here
}

A pure example of mode 3 argument passing is the following that takes a list (and its ownership), and returns a list containing the identical nodes in reverse order.

list reversed (list&& l) noexcept // pilfering reversal of list
{ list p(l.release()); // move list into temporary for traversal
  list result(nullptr);
  while (p.get()!=nullptr)
  { // permute: result --> p->next --> p --> (cycle to result)
    result.swap(p->next);
    result.swap(p);
  }
  return result;
}

This function might be called as in l = reversed(std::move(l)); to reverse the list into itself, but the reversed list can also be used differently.

Here the argument is immediately moved to a local variable for efficiency (one could have used the parameter l directly in the place of p, but then accessing it each time would involve an extra level of indirection); hence the difference with mode 1 argument passing is minimal. In fact using that mode, the argument could have served directly as local variable, thus avoiding that initial move; this is just an instance of the general principle that if an argument passed by reference only serves to initialise a local variable, one might just as well pass it by value instead and use the parameter as local variable.

Using mode 3 appears to be advocated by the standard, as witnessed by the fact that all provided library functions that transfer ownership of smart pointers using mode 3. A particular convincing case in point is the constructor std::shared_ptr<T>(auto_ptr<T>&& p). That constructor used (in std::tr1) to take a modifiable lvalue reference (just like the auto_ptr<T>& copy constructor), and could therefore be called with an auto_ptr<T> lvalue p as in std::shared_ptr<T> q(p), after which p has been reset to null. Due to the change from mode 2 to 3 in argument passing, this old code must now be rewritten to std::shared_ptr<T> q(std::move(p)) and will then continue to work. I understand that the committee did not like the mode 2 here, but they had the option of changing to mode 1, by defining std::shared_ptr<T>(auto_ptr<T> p) instead, they could have ensured that old code works without modification, because (unlike unique-pointers) auto-pointers can be silently dereferenced to a value (the pointer object itself being reset to null in the process). Apparently the committee so much preferred advocating mode 3 over mode 1, that they chose to actively break existing code rather than to use mode 1 even for an already deprecated usage.

When to prefer mode 3 over mode 1

Mode 1 is perfectly usable in many cases, and might be preferred over mode 3 in cases where assuming ownership would otherwise takes the form of moving the smart pointer to a local variable as in the reversed example above. However, I can see two reasons to prefer mode 3 in the more general case:

  • It is slightly more efficient to pass a reference than to create a temporary and nix the old pointer (handling cash is somewhat laborious); in some scenarios the pointer may be passed several times unchanged to another function before it is actually pilfered. Such passing will generally require writing std::move (unless mode 2 is used), but note that this is just a cast that does not actually do anything (in particular no dereferencing), so it has zero cost attached.

  • Should it be conceivable that anything throws an exception between the start of the function call and the point where it (or some contained call) actually moves the pointed-to object into another data structure (and this exception is not already caught inside the function itself), then when using mode 1, the object referred to by the smart pointer will be destroyed before a catch clause can handle the exception (because the function parameter was destructed during stack unwinding), but not so when using mode 3. The latter gives the caller has the option to recover the data of the object in such cases (by catching the exception). Note that mode 1 here does not cause a memory leak, but may lead to an unrecoverable loss of data for the program, which might be undesirable as well.

Returning a smart pointer: always by value

To conclude a word about returning a smart pointer, presumably pointing to an object created for use by the caller. This is not really a case comparable with passing pointers into functions, but for completeness I would like to insist that in such cases always return by value (and don't use std::move in the return statement). Nobody wants to get a reference to a pointer that probably has just been nixed.