What are move semantics in Rust?
In Rust, there are two possibilities to take a reference
Borrow, i.e., take a reference but don't allow mutating the reference destination. The
&
operator borrows ownership from a value.Borrow mutably, i.e., take a reference to mutate the destination. The
&mut
operator mutably borrows ownership from a value.
The Rust documentation about borrowing rules says:
First, any borrow must last for a scope no greater than that of the owner. Second, you may have one or the other of these two kinds of borrows, but not both at the same time:
- one or more references (
&T
) to a resource,- exactly one mutable reference (
&mut T
).
I believe that taking a reference is creating a pointer to the value and accessing the value by the pointer. This could be optimized away by the compiler if there is a simpler equivalent implementation.
However, I don't understand what move means and how it is implemented.
For types implementing the Copy
trait it means copying e.g. by assigning the struct member-wise from the source, or a memcpy()
. For small structs or for primitives this copy is efficient.
And for move?
This question is not a duplicate of What are move semantics? because Rust and C++ are different languages and move semantics are different between the two.
Solution 1:
Semantics
Rust implements what is known as an Affine Type System:
Affine types are a version of linear types imposing weaker constraints, corresponding to affine logic. An affine resource can only be used once, while a linear one must be used once.
Types that are not Copy
, and are thus moved, are Affine Types: you may use them either once or never, nothing else.
Rust qualifies this as a transfer of ownership in its Ownership-centric view of the world (*).
(*) Some of the people working on Rust are much more qualified than I am in CS, and they knowingly implemented an Affine Type System; however contrary to Haskell which exposes the math-y/cs-y concepts, Rust tends to expose more pragmatic concepts.
Note: it could be argued that Affine Types returned from a function tagged with #[must_use]
are actually Linear Types from my reading.
Implementation
It depends. Please keep in mind than Rust is a language built for speed, and there are numerous optimizations passes at play here which will depend on the compiler used (rustc + LLVM, in our case).
Within a function body (playground):
fn main() {
let s = "Hello, World!".to_string();
let t = s;
println!("{}", t);
}
If you check the LLVM IR (in Debug), you'll see:
%_5 = alloca %"alloc::string::String", align 8
%t = alloca %"alloc::string::String", align 8
%s = alloca %"alloc::string::String", align 8
%0 = bitcast %"alloc::string::String"* %s to i8*
%1 = bitcast %"alloc::string::String"* %_5 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 24, i32 8, i1 false)
%2 = bitcast %"alloc::string::String"* %_5 to i8*
%3 = bitcast %"alloc::string::String"* %t to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 24, i32 8, i1 false)
Underneath the covers, rustc invokes a memcpy
from the result of "Hello, World!".to_string()
to s
and then to t
. While it might seem inefficient, checking the same IR in Release mode you will realize that LLVM has completely elided the copies (realizing that s
was unused).
The same situation occurs when calling a function: in theory you "move" the object into the function stack frame, however in practice if the object is large the rustc compiler might switch to passing a pointer instead.
Another situation is returning from a function, but even then the compiler might apply "return value optimization" and build directly in the caller's stack frame -- that is, the caller passes a pointer into which to write the return value, which is used without intermediary storage.
The ownership/borrowing constraints of Rust enable optimizations that are difficult to reach in C++ (which also has RVO but cannot apply it in as many cases).
So, the digest version:
- moving large objects is inefficient, but there are a number of optimizations at play that might elide the move altogether
- moving involves a
memcpy
ofstd::mem::size_of::<T>()
bytes, so moving a largeString
is efficient because it only copies a couple bytes whatever the size of the allocated buffer they hold onto
Solution 2:
When you move an item, you are transferring ownership of that item. That's a key component of Rust.
Let's say I had a struct, and then I assign the struct from one variable to another. By default, this will be a move, and I've transferred ownership. The compiler will track this change of ownership and prevent me from using the old variable any more:
pub struct Foo {
value: u8,
}
fn main() {
let foo = Foo { value: 42 };
let bar = foo;
println!("{}", foo.value); // error: use of moved value: `foo.value`
println!("{}", bar.value);
}
how it is implemented.
Conceptually, moving something doesn't need to do anything. In the example above, there wouldn't be a reason to actually allocate space somewhere and then move the allocated data when I assign to a different variable. I don't actually know what the compiler does, and it probably changes based on the level of optimization.
For practical purposes though, you can think that when you move something, the bits representing that item are duplicated as if via memcpy
. This helps explain what happens when you pass a variable to a function that consumes it, or when you return a value from a function (again, the optimizer can do other things to make it efficient, this is just conceptually):
// Ownership is transferred from the caller to the callee
fn do_something_with_foo(foo: Foo) {}
// Ownership is transferred from the callee to the caller
fn make_a_foo() -> Foo { Foo { value: 42 } }
"But wait!", you say, "memcpy
only comes into play with types implementing Copy
!". This is mostly true, but the big difference is that when a type implements Copy
, both the source and the destination are valid to use after the copy!
One way of thinking of move semantics is the same as copy semantics, but with the added restriction that the thing being moved from is no longer a valid item to use.
However, it's often easier to think of it the other way: The most basic thing that you can do is to move / give ownership away, and the ability to copy something is an additional privilege. That's the way that Rust models it.
This is a tough question for me! After using Rust for a while the move semantics are natural. Let me know what parts I've left out or explained poorly.
Solution 3:
Rust's move
keyword always bothers me so, I decided to write my understanding which I obtained after discussion with my colleagues.
I hope this might help someone.
let x = 1;
In the above statement, x is a variable whose value is 1. Now,
let y = || println!("y is a variable whose value is a closure");
So, move
keyword is used to transfer the ownership of a variable to the closure.
In the below example, without move
, x
is not owned by the closure. Hence x
is not owned by y
and available for further use.
let x = 1;
let y = || println!("this is a closure that prints x = {}". x);
On the other hand, in this next below case, the x
is owned by the closure. x
is owned by y
and not available for further use.
let x = 1;
let y = move || println!("this is a closure that prints x = {}". x);
By owning
I mean containing as a member variable
. The example cases above are in the same situation as the following two cases. We can also assume the below explanation as to how the Rust compiler expands the above cases.
The formar (without move
; i.e. no transfer of ownership),
struct ClosureObject {
x: &u32
}
let x = 1;
let y = ClosureObject {
x: &x
};
The later (with move
; i.e. transfer of ownership),
struct ClosureObject {
x: u32
}
let x = 1;
let y = ClosureObject {
x: x
};
Solution 4:
Please let me answer my own question. I had trouble, but by asking a question here I did Rubber Duck Problem Solving. Now I understand:
A move is a transfer of ownership of the value.
For example the assignment let x = a;
transfers ownership: At first a
owned the value. After the let
it's x
who owns the value. Rust forbids to use a
thereafter.
In fact, if you do println!("a: {:?}", a);
after the let
the Rust compiler says:
error: use of moved value: `a`
println!("a: {:?}", a);
^
Complete example:
#[derive(Debug)]
struct Example { member: i32 }
fn main() {
let a = Example { member: 42 }; // A struct is moved
let x = a;
println!("a: {:?}", a);
println!("x: {:?}", x);
}
And what does this move mean?
It seems that the concept comes from C++11. A document about C++ move semantics says:
From a client code point of view, choosing move instead of copy means that you don't care what happens to the state of the source.
Aha. C++11 does not care what happens with source. So in this vein, Rust is free to decide to forbid to use the source after a move.
And how it is implemented?
I don't know. But I can imagine that Rust does literally nothing. x
is just a different name for the same value. Names usually are compiled away (except of course debugging symbols). So it's the same machine code whether the binding has the name a
or x
.
It seems C++ does the same in copy constructor elision.
Doing nothing is the most efficient possible.