string c_str() vs. data()

Solution 1:

The documentation is correct. Use c_str() if you want a null terminated string.

If the implementers happend to implement data() in terms of c_str() you don't have to worry, still use data() if you don't need the string to be null terminated, in some implementation it may turn out to perform better than c_str().

strings don't necessarily have to be composed of character data, they could be composed with elements of any type. In those cases data() is more meaningful. c_str() in my opinion is only really useful when the elements of your string are character based.

Extra: In C++11 onwards, both functions are required to be the same. i.e. data is now required to be null-terminated. According to cppreference: "The returned array is null-terminated, that is, data() and c_str() perform the same function."

Solution 2:

In C++11/C++0x, data() and c_str() is no longer different. And thus data() is required to have a null termination at the end as well.

21.4.7.1 basic_string accessors [string.accessors]

const charT* c_str() const noexcept;

const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].


21.4.5 basic_string element access [string.access]

const_reference operator[](size_type pos) const noexcept;

1 Requires: pos <= size(). 2 Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

Solution 3:

Even know you have seen that they do the same, or that .data() calls .c_str(), it is not correct to assume that this will be the case for other compilers. It is also possible that your compiler will change with a future release.

2 reasons to use std::string:

std::string can be used for both text and arbitrary binary data.

//Example 1
//Plain text:
std::string s1;
s1 = "abc";

//Example 2
//Arbitrary binary data:
std::string s2;
s2.append("a\0b\0b\0", 6);

You should use the .c_str() method when you are using your string as example 1.

You should use the .data() method when you are using your string as example 2. Not because it is dangereous to use .c_str() in these cases, but because it is more explicit that you are working with binary data for others reviewing your code.

Possible pitfall with using .data()

The following code is wrong and could cause a segfault in your program:

std::string s;
s = "abc";   
char sz[512]; 
strcpy(sz, s.data());//This could crash depending on the implementation of .data()

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

Solution 4:

It has been answered already, some notes on the purpose: Freedom of implementation.

std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

This would allow an implementation to have substrings share the actual string data: string::substr could internally hold a reference to shared string data, and the start/end range, avoiding the copy (and additional allocation) of the actual string data. The implementation would defer the copy until you call c_str or modify any of the strings. No copy would ever be made if the sub-strings involved are just read.

(copy-on-write implementation aren't much fun in multithreaded environments, plus the typical memory/allocation savings aren't worth the more complex code today, so it's rarely done).


Similarly, string::data allows a different internal representation, e.g. a rope (linked list of string segments). This can improve insert / replace operations significantly. again, the list of segments would have to be collapsed to a single segment when you call c_str or data.