string data() and c_str() functions

Feb 3, 2011 at 4:14pm
My first question is that is a NULL terminating character put end of the string while string str="abcd" is defined? Or no need it because it already stores the its length.I am asking this question because in this web website it was written that when we call string::c_str() or data() methods it "Returns a pointer to an array of characters with the same content as the string". It is like that new array is created and it copies original string there and its pointer is returned.However when I printed
the adresses like that printf("%p %p %p", str.data() , str.c_str(), &str[0])
they are all equal.Hence, I think that no new array is created in string object while calling data() and c_str() methods it directly returns the original content of the string object.If it is like I thought what is the difference between c_str() and data() methods in here it is said that in c_str() method a NULL terminating chracter is put end of the array but not in data() method.Here I understand that at first there is no NULL character at the end of the string that I talked about above.As a result, I concluded that data() and c_str() both return orginal string stored in the object and only difference is c_str() puts an NULL character at the end of the array which was not there at first.Am I right?
Feb 3, 2011 at 4:33pm
The exact implementation of the std::string class is not explicitly specified. Most implementations will work fine with .c_str, .data and &.[0]. There may however exist an implementation where the array returned by data does not have a 0. Thous you should use appropriate functions: c_str when you need a c-string, data when you need an array with chars and operator [] when you need a single char.
Feb 3, 2011 at 6:55pm
This is what the standard has to say in its `98 variant (draft variant).

21.3.6 - basic_string string operations [lib.string.ops]

const charT* c_str() const;

-1- Returns: A pointer to the initial element of an array of length size() + 1 whose first size() elements equal the corresponding elements of the string controlled by *this and whose last element is a null character specified by charT().

-2- Requires: The program shall not alter any of the values stored in the array. Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function of the class basic_string that designates the same object as this.

const charT* data() const;

-3- Returns: If size() is nonzero, the member returns a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this. If size() is zero, the member returns a non-null pointer that is copyable and can have zero added to it.

-4- Requires: The program shall not alter any of the values stored in the character array. Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non- const member function of basic_string that designates the same object as this.

It tells you that the returned contiguous block of memory will have the same contents as the block of memory that the string object uses internally. However, it doesn't guarantee that the two blocks would be the same. Most implementations would reuse the original block if there is space for one more character. The worst case complexity would turn out to be linear if c_str ends up copying the old contents. There is no guarantee that this won't happen however.

I post all of this because the current draft (or almost current: n3126) has very different text:
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const;
const charT* data() const;

1 Returns: a pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Throws: nothing.
3 Complexity: constant time.
4 Requires: The program shall not alter any of the values stored in the character array.

Now that's what I call a radical change. The two functions do not have separate specification anymore. Also, looking at the pointer arithmetic guarantee, you can see that the returned buffer would have to be the same as the one that the string uses internally. There is a complexity guarantee - constant; which given the new semantics is understandable. So, how would one produce a C string in this set-up - write the terminating null character manually (through push_back) or what? I don't know.

Sometimes I have the feeling the C++ standard is intently vague in order to avoid stepping on someone's toes.

Regards
Feb 3, 2011 at 7:18pm
So, how would one produce a C string in this set-up - write the terminating null character manually (through push_back) or what? I don't know.

That's not necessary, as p+size() is included in the valid range and operator[](size()) is guaranteed to return a reference to a default-initialized charT.
Feb 3, 2011 at 7:52pm
@Athar
Aaaah. You mean this little aspect:
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1 Requires: pos <= size().
2 Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
3 Throws: nothing.

I confess, the default initialized char thing did not occur to me at all.

Regards
Feb 3, 2011 at 10:30pm
Thank you for your answers.

Sometimes I have the feeling the C++ standard is intently vague in order to avoid stepping on someone's toes.

Why do you think like that simeonz?
Feb 3, 2011 at 10:52pm
Oh .. I don't know. I'm not implying any evil intent or anything. I am grateful to the committee for the work they do with so much dedication. But if you look at the lists of pending issues and the clarification remarks, it becomes obvious that the opinions of the members are not synchronized. People have personal bias and perspective and the communication suffers. Another issue is that the standard is constantly split between the need to leave some things implementation defined purely in service of the compiler vendors, and to make them reliably defined in service of the programmers. You need both to bake a cake. :)

Regards
Feb 4, 2011 at 9:33pm
Hmm, ok friend I think you are right. :)
Feb 5, 2011 at 11:28am
Friends,

First, string STL is like 'vector<char>' with some extra/modified functions for character string. So We can store any ASCII character.
Second as per my understanding and RND, 'c_str' says the string should be NULL terminated but 'data' do not care about the character after last character. So char compilers do is they always have NULL character after last character in a string stl and they return start address of internal array used to store the characters.
Last edited on Feb 5, 2011 at 11:29am
Feb 5, 2011 at 3:12pm
What happens according to this strategy when you append characters and the space is exhausted?

I.e. you have only one character left (you internally use for NULL) and you need it (for the appending/push-ing). You are right that the libraries try to incur little cost for frequent usage scenarios. If most libraries didn't deal with those cases in the "expected" way, then probably the standard would not get ratified on this point.

Regards
Topic archived. No new replies allowed.