Are there downsides to using std::string as a buffer?

Solution 1:

Don't use std::string as a buffer.

It is bad practice to use std::string as a buffer, for several reasons (listed in no particular order):

  • std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
  • As a concrete example: Before C++17, you can't even write through the pointer you get with data() - it's const Tchar *; so your code would cause undefined behavior. (But &(str[0]), &(str.front()), or &(*(str.begin())) would work.)
  • Using std::strings for buffers is confusing to readers of your function's definition, who assume you would be using std::string for, well, strings. In other words, doing so breaks the Principle of Least Astonishment.
  • Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
  • std::unique_ptr would be fine for your case, or even std::vector. In C++17, you can use std::byte for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost's small_vector (thank you, @gast128, for mentioning it).
  • (Minor point:) libstdc++ had to change its ABI for std::string to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.

Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()ing. But that in itself is not really a reason to avoid std::string, since you can avoid the double allocation using the construction in @Jarod42's answer.

Solution 2:

You can completely avoid a manual memcpy by calling the appropriate constructor:

std::string receive_data(const Receiver& receiver) {
    return {receiver.data(), receiver.size()};
}

That even handles \0 in a string.

BTW, unless content is actually text, I would prefer std::vector<std::byte> (or equivalent).

Solution 3:

Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but is this good behavior and why?

The current code may have undefined behavior, depending on the C++ version. To avoid undefined behavior in C++14 and below take the address of the first element. It yields a non-const pointer:

buff.resize(size);
memcpy(&buff[0], &receiver[0], size);

I have recently seen a colleague of mine using std::string as a buffer...

That was somewhat common in older code, especially circa C++03. There are several benefits and downsides to using a string like that. Depending on what you are doing with the code, std::vector can be a bit anemic, and you sometimes used a string instead and accepted the extra overhead of char_traits.

For example, std::string is usually a faster container than std::vector on append, and you can't return std::vector from a function. (Or you could not do so in practice in C++98 because C++98 required the vector to be constructed in the function and copied out). Additionally, std::string allowed you to search with a richer assortment of member functions, like find_first_of and find_first_not_of. That was convenient when searching though arrays of bytes.

I think what you really want/need is SGI's Rope class, but it never made it into the STL. It looks like GCC's libstdc++ may provide it.


There a lengthy discussion about this being legal in C++14 and below:

const char* dst_ptr = buff.data();
const char* src_ptr = receiver.data();
memcpy((char*) dst_ptr, src_ptr, size);

I know for certain it is not safe in GCC. I once did something like this in some self tests and it resulted in a segfault:

std::string buff("A");
...

char* ptr = (char*)buff.data();
size_t len = buff.size();

ptr[0] ^= 1;  // tamper with byte
bool tampered = HMAC(key, ptr, len, mac);

GCC put the single byte 'A' in register AL. The high 3-bytes were garbage, so the 32-bit register was 0xXXXXXX41. When I dereferenced at ptr[0], GCC dereferenced a garbage address 0xXXXXXX41.

The two take-aways for me were, don't write half-ass self tests, and don't try to make data() a non-const pointer.

Solution 4:

From C++17, data can return a non const char *.

Draft n4659 declares at [string.accessors]:

const charT* c_str() const noexcept;
const charT* data() const noexcept;
....
charT* data() noexcept;