Are there downsides to using std::string as a buffer?
Solution 1:
Don't use std::string
as a buffer.
It is bad practice to use std::string
as a buffer, for several reasons (listed in no particular order):
-
std::string
was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior). - As a concrete example: Before C++17, you can't even write through the pointer you get with
data()
- it'sconst Tchar *
; so your code would cause undefined behavior. (But&(str[0])
,&(str.front())
, or&(*(str.begin()))
would work.) - Using
std::string
s for buffers is confusing to readers of your function's definition, who assume you would be usingstd::string
for, well, strings. In other words, doing so breaks the Principle of Least Astonishment. - Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
-
std::unique_ptr
would be fine for your case, or evenstd::vector
. In C++17, you can usestd::byte
for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost'ssmall_vector
(thank you, @gast128, for mentioning it). - (Minor point:) libstdc++ had to change its ABI for
std::string
to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.
Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()
ing. But that in itself is not really a reason to avoid std::string
, since you can avoid the double allocation using the construction in @Jarod42's answer.
Solution 2:
You can completely avoid a manual memcpy
by calling the appropriate constructor:
std::string receive_data(const Receiver& receiver) {
return {receiver.data(), receiver.size()};
}
That even handles \0
in a string.
BTW, unless content is actually text, I would prefer std::vector<std::byte>
(or equivalent).
Solution 3:
Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but is this good behavior and why?
The current code may have undefined behavior, depending on the C++ version. To avoid undefined behavior in C++14 and below take the address of the first element. It yields a non-const pointer:
buff.resize(size);
memcpy(&buff[0], &receiver[0], size);
I have recently seen a colleague of mine using
std::string
as a buffer...
That was somewhat common in older code, especially circa C++03. There are several benefits and downsides to using a string like that. Depending on what you are doing with the code, std::vector
can be a bit anemic, and you sometimes used a string instead and accepted the extra overhead of char_traits
.
For example, std::string
is usually a faster container than std::vector
on append, and you can't return std::vector
from a function. (Or you could not do so in practice in C++98 because C++98 required the vector to be constructed in the function and copied out). Additionally, std::string
allowed you to search with a richer assortment of member functions, like find_first_of
and find_first_not_of
. That was convenient when searching though arrays of bytes.
I think what you really want/need is SGI's Rope class, but it never made it into the STL. It looks like GCC's libstdc++ may provide it.
There a lengthy discussion about this being legal in C++14 and below:
const char* dst_ptr = buff.data();
const char* src_ptr = receiver.data();
memcpy((char*) dst_ptr, src_ptr, size);
I know for certain it is not safe in GCC. I once did something like this in some self tests and it resulted in a segfault:
std::string buff("A");
...
char* ptr = (char*)buff.data();
size_t len = buff.size();
ptr[0] ^= 1; // tamper with byte
bool tampered = HMAC(key, ptr, len, mac);
GCC put the single byte 'A'
in register AL
. The high 3-bytes were garbage, so the 32-bit register was 0xXXXXXX41
. When I dereferenced at ptr[0]
, GCC dereferenced a garbage address 0xXXXXXX41
.
The two take-aways for me were, don't write half-ass self tests, and don't try to make data()
a non-const pointer.
Solution 4:
From C++17, data
can return a non const char *
.
Draft n4659 declares at [string.accessors]:
const charT* c_str() const noexcept; const charT* data() const noexcept; .... charT* data() noexcept;