What are the differences between Rust's `String` and `str`?

String is the dynamic heap string type, like Vec: use it when you need to own or modify your string data.

str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer. This means that str most commonly2 appears as &str: a reference to some UTF-8 data, normally called a "string slice" or just a "slice". A slice is just a view onto some data, and that data can be anywhere, e.g.

  • In static storage: a string literal "foo" is a &'static str. The data is hardcoded into the executable and loaded into memory when the program runs.

  • Inside a heap allocated String: String dereferences to a &str view of the String's data.

  • On the stack: e.g. the following creates a stack-allocated byte array, and then gets a view of that data as a &str:

    use std::str;
    
    let x: &[u8] = &[b'a', b'b', b'c'];
    let stack_str: &str = str::from_utf8(x).unwrap();
    

In summary, use String if you need owned string data (like passing strings to other threads, or building them at runtime), and use &str if you only need a view of a string.

This is identical to the relationship between a vector Vec<T> and a slice &[T], and is similar to the relationship between by-value T and by-reference &T for general types.


1 A str is fixed-length; you cannot write bytes beyond the end, or leave trailing invalid bytes. Since UTF-8 is a variable-width encoding, this effectively forces all strs to be immutable in many cases. In general, mutation requires writing more or fewer bytes than there were before (e.g. replacing an a (1 byte) with an ä (2+ bytes) would require making more room in the str). There are specific methods that can modify a &mut str in place, mostly those that handle only ASCII characters, like make_ascii_uppercase.

2Dynamically sized types allow things like Rc<str> for a sequence of reference counted UTF-8 bytes since Rust 1.2. Rust 1.21 allows easily creating these types.


I have a C++ background and I found it very useful to think about String and &str in C++ terms:

  • A Rust String is like a std::string; it owns the memory and does the dirty job of managing memory.
  • A Rust &str is like a char* (but a little more sophisticated); it points us to the beginning of a chunk in the same way you can get a pointer to the contents of std::string.

Are either of them going to disappear? I do not think so. They serve two purposes:

String keeps the buffer and is very practical to use. &str is lightweight and should be used to "look" into strings. You can search, split, parse, and even replace chunks without needing to allocate new memory.

&str can look inside of a String as it can point to some string literal. The following code needs to copy the literal string into the String managed memory:

let a: String = "hello rust".into();

The following code lets you use the literal itself without copy (read only though)

let a: &str = "hello rust";

str, only used as &str, is a string slice, a reference to a UTF-8 byte array.

String is what used to be ~str, a growable, owned UTF-8 byte array.


They are actually completely different. First off, a str is nothing but a type level thing; it can only be reasoned about at the type level because it's a so-called dynamically-sized type (DST). The size the str takes up cannot be known at compile time and depends on runtime information — it cannot be stored in a variable because the compiler needs to know at compile time what the size of each variable is. A str is conceptually just a row of u8 bytes with the guarantee that it forms valid UTF-8. How large is the row? No one knows until runtime hence it can't be stored in a variable.

The interesting thing is that a &str or any other pointer to a str like Box<str> does exist at runtime. This is a so-called "fat pointer"; it's a pointer with extra information (in this case the size of the thing it's pointing at) so it's twice as large. In fact, a &str is quite close to a String (but not to a &String). A &str is two words; one pointer to a the first byte of a str and another number that describes how many bytes long the the str is.

Contrary to what is said, a str does not need to be immutable. If you can get a &mut str as an exclusive pointer to the str, you can mutate it and all the safe functions that mutate it guarantee that the UTF-8 constraint is upheld because if that is violated then we have undefined behaviour as the library assumes this constraint is true and does not check for it.

So what is a String? That's three words; two are the same as for &str but it adds a third word which is the capacity of the str buffer on the heap, always on the heap (a str is not necessarily on the heap) it manages before it's filled and has to re-allocate. the String basically owns a str as they say; it controls it and can resize it and reallocate it when it sees fit. So a String is as said closer to a &str than to a str.

Another thing is a Box<str>; this also owns a str and its runtime representation is the same as a &str but it also owns the str unlike the &str but it cannot resize it because it does not know its capacity so basically a Box<str> can be seen as a fixed-length String that cannot be resized (you can always convert it into a String if you want to resize it).

A very similar relationship exists between [T] and Vec<T> except there is no UTF-8 constraint and it can hold any type whose size is not dynamic.

The use of str on the type level is mostly to create generic abstractions with &str; it exists on the type level to be able to conveniently write traits. In theory str as a type thing didn't need to exist and only &str but that would mean a lot of extra code would have to be written that can now be generic.

&str is super useful to be able to to have multiple different substrings of a String without having to copy; as said a String owns the str on the heap it manages and if you could only create a substring of a String with a new String it would have to copied because everything in Rust can only have one single owner to deal with memory safety. So for instance you can slice a string:

let string: String   = "a string".to_string();
let substring1: &str = &string[1..3];
let substring2: &str = &string[2..4];

We have two different substring strs of the same string. string is the one that owns the actual full str buffer on the heap and the &str substrings are just fat pointers to that buffer on the heap.


It is str that is analogous to String, not the slice to it, also known as &str.

An str is a string literal, basically a pre-allocated text:

"Hello World"

This text has to be stored somewhere, so it is stored in the data section of the executable file along with the program’s machine code, as sequence of bytes ([u8]). Because text can be of any length, they are dynamically-sized, their size is known only at run-time:

+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
|  H |  e  |  l  |  l  |  o  |    |  W |  o  |  r  |  l  |  d  |
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+

+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
| 72 | 101 | 108 | 108 | 111 | 32 | 87 | 111 | 114 | 108 | 100 |
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+

We need a way to access a stored text and that is where the slice comes in.

A slice,[T], is a view into a block of memory. Whether mutable or not, a slice always borrows and that is why it is always behind a pointer, &.

Lets explain the meaning of being dynamically sized. Some programming languages, like C, appends a zero byte (\0) at the end of its strings and keeps a record of the starting address. To determine a string's length, program has to walk through the raw bytes from starting position until finding this zero byte. So, length of a text can be of any size hence it is dynamically sized.

A c program has to walk through the text to find the size of a string. However Rust takes a different approach than looking for \0. It uses a slice. A slice stores the address where a str starts and how many byte it takes. This slice is also stored in the binary. It is better than appending zero byte because calculation is done in advance during compilation.

So, "Hello World" expression returns a fat pointer, containing both the address of the actual data and its length. This pointer will be our handle to the actual data and this handle will also be stored in our program. Now data is behind a pointer and the compiler knows its size at compile time.

Since text is stored in the source code, it will be valid for the entire lifetime of the running program, hence will have the static lifetime.

So, return value of "Hello Word" expression should reflect these two characteristics, and it does:

let s: &'static str = "Hello World";

You may ask why its type is written as str but not as [u8], it is because data is always guaranteed to be a valid UTF-8 sequence. Not all UTF-8 characters are single byte, some take 4 bytes. So [u8] would be inaccurate.

If you disassemble a compiled Rust program and inspect the executable file, you will see multiple strs are stored adjacent to each other in the data section without any indication where one starts and the other ends.

Compiler takes it even further. If identical static text is used at multiple locations in your program, Rust compiler will optimize your program and create a single binary block in the executable's data section and each slice in your code point to this binary block.

For example, compiler creates a single continuous binary with the content of "Hello World" for the following code even though we use three different literals with "Hello World":

let x: &'static str = "Hello World";
let y: &'static str = "Hello World";
let z: &'static str = "Hello World";

String, on the other hand, is a specialized type that stores its value as vector of u8. Here is how String type is defined in the source code:

pub struct String {
    vec: Vec<u8>,
}

Being vector means it is heap allocated and resizable like any other vector value.

Being specialized means it does not permit arbitrary access and enforces certain checks that data is always valid UTF-8. Other than that, it is just a vector.

So a String is a resizable buffer holding UTF-8 text. This buffer is allocated on the heap, so it can grow as needed or requested. We can fill this buffer anyway we see fit. We can change its content.

If you look carefully vec field is kept private to enforce validity. Since it is private, we can not create a String instance directly. The reason why it is kept private because not all stream of bytes produce valid utf-8 characters and direct interaction with the underlying bytes may corrupt the string. We create u8 bytes through methods and methods runs certain checks. We can say that being private and having controlled interaction via methods provides certain guarantees.

There are several methods defined on String type to create String instance, new is one of them:

pub const fn new() -> String {
  String { vec: Vec::new() }
}

We can use it to create a valid String.

let s = String::new();
println("{}", s);

Unfortunately it does not accept input parameter. So result will be valid but an empty string but it will grow like any other vector when capacity is not enough to hold the assigned value. But application performance will take a hit, as growing requires re-allocation.

We can fill the underlying vector with initial values from different sources:

From a string literal

let a = "Hello World";
let s = String::from(a);

Please note that an str is still created and its content is copied to the heap allocated vector via String.from. If we check the executable binary we will see row bytes in data section with the content "Hello World". This is very important detail some people miss.

From raw parts

let ptr = s.as_mut_ptr();
let len = s.len();
let capacity = s.capacity();

let s = String::from_raw_parts(ptr, len, capacity);

From a character

let ch = 'c';
let s = ch.to_string();

From vector of bytes

let hello_world = vec![72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100];
// We know it is valid sequence, so we can use unwrap
let hello_world = String::from_utf8(hello_world).unwrap();
println!("{}", hello_world); // Hello World

Here we have another important detail. A vector might have any value, there is no guarantee its content will be a valid UTF-8, so Rust forces us to take this into consideration by returning a Result<String, FromUtf8Error> rather than a String.

From input buffer

use std::io::{self, Read};

fn main() -> io::Result<()> {
    let mut buffer = String::new();
    let stdin = io::stdin();
    let mut handle = stdin.lock();

    handle.read_to_string(&mut buffer)?;
    Ok(())
}

Or from any other type that implements ToString trait

Since String is a vector under the hood, it will exhibit some vector characteristics:

  • a pointer: The pointer points to an internal buffer that stores the data.
  • length: The length is the number of bytes currently stored in the buffer.
  • capacity: The capacity is the size of the buffer in bytes. So, the length will always be less than or equal to the capacity.

And it delegates some properties and methods to vectors:

pub fn capacity(&self) -> usize {
  self.vec.capacity()
}

Most of the examples uses String::from, so people get confused thinking why create String from another string.

It is a long read, hope it helps.