Is string::c_str() no longer null terminated in C++11?
Solution 1:
Strings are now required to use null-terminated buffers internally. Look at the definition of operator[]
(21.4.5):
Requires:
pos <= size()
.Returns:
*(begin() + pos)
ifpos < size()
, otherwise a reference to an object of typeT
with valuecharT()
; the referenced value shall not be modified.
Looking back at c_str
(21.4.7.1/1), we see that it is defined in terms of operator[]
:
Returns: A pointer
p
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
And both c_str
and data
are required to be O(1), so the implementation is effectively forced to use null-terminated buffers.
Additionally, as David Rodríguez - dribeas points out in the comments, the return value requirement also means that you can use &operator[](0)
as a synonym for c_str()
, so the terminating null character must lie in the same buffer (since *(p + size())
must be equal to charT()
); this also means that even if the terminator is initialised lazily, it's not possible to observe the buffer in the intermediate state.
Solution 2:
Well, in fact it is true that the new standard stipulates that .data() and .c_str() are now synonyms. However, it doesn't say that .c_str() is no longer zero-terminated :)
It just means that you can now rely on .data() being zero-terminated as well.
Paper N2668 defines c_str() and data() members of std::basic_string as follows:
const charT* c_str() const; const charT* data() const;
Returns: A pointer to the initial element of an array of length size() + 1 whose first size() elements equal the corresponding elements of the string controlled by *this and whose last element is a null character specified by charT().
Requires: The program shall not alter any of the values stored in the character array.
Note that this does NOT mean that any valid std::string can be treated as a C-string because std::string can contain embedded nulls, which will prematurely end the C-string when used directly as a const char*.
Addendum:
I don't have access to the actual published final spec of C++11 but it appears that indeed the wording was dropped somewhere in the revision history of the spec: e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf
§ 21.4.7 basic_string string operations
[string.ops]
§ 21.4.7.1 basic_string accessors
[string.accessors]
const charT* c_str() const noexcept; const charT* data() const noexcept;
- Returns: A pointer p such that
p + i == &operator[](i)
for eachi
in[0,size()]
.- Complexity: constant time.
- Requires: The program shall not alter any of the values stored in the character array.
Solution 3:
The "history" was that a long time ago when everyone worked in single threads, or at least the threads were workers with their own data, they designed a string class for C++ which made string handling easier than it had been before, and they overloaded operator+ to concatenate strings.
The issue was that users would do something like:
s = s1 + s2 + s3 + s4;
and each concatenation would create a temporary which had to implement a string.
Therefore someone had the brainwave of "lazy evaluation" such that internally you could store some kind of "rope" with all the strings until someone wanted to read it as a C-string at which point you would change the internal representation to a contiguous buffer.
This solved the problem above but caused a load of other headaches, in particular in the multi-threaded world where one expected a .c_str() operation to be read-only / doesn't change anything and therefore no need to lock anything. Premature internal-locking in the class implementation just in case someone was doing it multi-threaded (when there wasn't even a threading standard) was also not a good idea. In fact it was more costly to do anything of this than simply copy the buffer each time. Same reason "copy on write" implementation was abandoned for string implementations.
Thus making .c_str()
a truly immutable operation turned out to be the most sensible thing to do, however could one "rely" on it in a standard that now is thread-aware? Therefore the new standard decided to clearly state that you can, and thus the internal representation needs to hold the null terminator.