How to convert *const u8 with length to &str without reallocation?

I am looking for the best way (hopefully zero cost) to achieve this:

fn to_str(str: *const u8, len: usize) -> Option<&str>;

len is the length of a string that may or may not be null-terminated and str is a pointer to that string.

I do not want to take ownership of the string and just need to pass it around as &str.


Solution 1:

Rust's reference such as &str is associated with a lifetime. This lifetime is attached to the value that owns the underlying data, typically a container like String, Vec or array. So to get a valid &str you need an owner. You don't want to take ownership of the data because you don't want to copy it. However, owning doesn't imply copying, it just means taking exclusive responsibility for mutating and destroying the data.

To own data represented by a pointer coming from C's malloc() without copying the data, you could wrap the pointer:

pub struct MyString {
    data: *const u8,
    length: usize,
}

impl MyString {
    // safety: data must point to nul-terminated memory allocated with malloc()
    pub unsafe fn new(data: *const u8, length: usize) -> MyString {
        // Note: no reallocation happens here, we use `str::from_utf8()` only to
        // check whether the pointer contains valid UTF-8.
        // If panic is unacceptable, the constructor can return a `Result`
        if std::str::from_utf8(std::slice::from_raw_parts(data, length)).is_err() {
            panic!("invalid utf-8")
        }
        MyString { data, length }
    }

    pub fn as_str(&self) -> &str {
        unsafe {
            // from_utf8_unchecked is sound because we checked in the constructor
            std::str::from_utf8_unchecked(std::slice::from_raw_parts(self.data, self.length))
        }
    }
}

impl Drop for MyString {
    fn drop(&mut self) {
        unsafe {
            libc::free(self.data as *mut _);
        }
    }
}

This only requires unsafe when constructing the wrapper with MyString::new() because it takes a raw pointer whose validity cannot be checked at compile time. Afterwards the wrapper gives you the &str that you can pass around without any unsafe:

fn main() {
    let raw_str = unsafe { libc::strdup(b"foo\0".as_ptr() as _) as *const u8 };
    let s = unsafe { MyString::new(raw_str, 3) };
    // from here on, it's all-safe code
    let slice = s.as_str();  // now you get a slice to pass around
    assert_eq!(slice, "foo");
}

Playground

If you don't want MyString to deallocate the data, then you can simply delete the Drop implementation. In either case, new() has a safety invariant that the data must not be deallocated while MyString is live.

One final difference between C strings and Rust &str is that Rust strings are guaranteed to be UTF-8, and creating non-UTF-8 strings (which can only be done in unsafe code) constitutes undefined behavior. This is why either MyString::new() or MyString::as_str() need to verify that the string contains valid UTF-8. Putting the check in new() ensures the check is done at most once. You can remove the check, but then new() gets another safety invariant, one that is unlikely to be respected by C code that creates strings.

To represent arbitrary binary data, you can use &[u8] instead of &str, or use a crate like bstr that gives you "byte strings" with all the conveniences of &str but without requiring UTF-8 requirement.