about string function

Forum

Forum
General C++ Programming
about string function

about string function

do you think this function is safe?

string getValue()
{
string val = "aaa";
return val;
}

It seems ok, but is the val a temporary object and the lifetime is garanteed within the function?
I do see something similar used in many places but no problem (as far as I know).

Could someone explain that? In case this function is wrong, how can I implement it to do this simple thing in C++?
Thanks

quirkyusername (792)

That's fine, if you were returning a reference to a local object then that would be a problem

chrisben (99)

Thanks. That is I assumed and the way I used. However, I suddenly got confused is that WHY it is ok if the lifetime of the variable inside the function is within the function. Why did not I get unpredictable behavior by using it? if using char* instead of string, are there any differences?
Thanks

jsmith (5804)

The function makes a copy of the stack variable to return before it destroys the stack variable.

simeonz (490)

chrisben wrote:
Why did not I get unpredictable behavior by using it? if using char* instead of string, are there any differences?

char * is a pointer. A pointer is a value. You return the value of a pointer and not the value of the object pointed by the pointer. So if the pointed object "dies", accessing it becomes a bug. It is more complicated, but the simple explanation is that the string class has a copy constructor that creates a new string and internally stores pointer to the new string. So the string class copies not the pointer, but it copies the entire string to a new location. Therefore, you don't have to worry for the lifetime of the original character sequence, like when you use char *.

Regards

chrisben (99)

Thanks. Now let me explore it more

char* getVal1()
{
char* c = new char[10];
return c;
}

char* getVal2()
{
char* c = "1234";
return c;
}

So is getVal1 correct, but getVal2 may get unpredicatable behavior?
Thanks

ceruleus (218)

be careful when using

char* c = new char[10]

because of memory issues
you will need to release that memory at some point, something like

delete [] c; syntax?

(sorry for incomplete post just trying to help)

chrisben (99)

thanks for the reminder. I normally will wrap it with boost shareptr. this sample is just for concept clarification.
Thanks

firedraco (6248)

1) Yes, but like ceruleus said, you'll have to remember to delete it.

2) Correct, since the memory you are pointing to get destroyed after you return from the function.

chrisben (99)

thanks for your time.

simeonz (490)

#include <cstring>
#include <iostream>
#include <boost/shared_array.hpp>

using namespace std;
using namespace boost;

char* getVal1()
{
    char c[] = "1234";
    return c;
}

const char* getVal2()
{
    const char* c = "1234";
    return c;
}

const char* getVal3()
{
    static const char c_init[] = "1234";
    const char* c = c_init;
    return c;
}

char* getVal4()
{
    char* c = "1234";
    return c;
}

char* getVal5()
{
    static const char c_init[] = "1234";
    char* c = const_cast<char*>(c_init);
    return c;
}

char* getVal6()
{
    static char c[] = "1234";
    return c;
}

struct string_struct { char value[10]; };

string_struct getVal7()
{
    string_struct c = {"1234"};
    return c;
}

char* getVal8()
{
    static const char c_init[] = "1234";
    char* c = new char[sizeof(c_init)];
    memcpy(c, c_init, sizeof(c_init));
    return c;
}

shared_array<char> getVal10()
{
    static const char c_init[] = "1234";
    shared_array<char> c(new char[sizeof(c_init)]);
    memcpy(c.get(), c_init, sizeof(c_init));
    return c;
}

string getVal11()
{
    string c("1234");
    return c;
}

int main()
{
    char* c;
    const char* cc;

    //================== CASE 1 ==================
    //This is buggy. You return pointer to character array allocated on the stack.
    //c = getVal1();

    //================== CASE 2 ==================
    //This is ok, but you can not modify the string.
    //There is no need to deallocate anything.
    cc = getVal2();

    //================== CASE 3 ==================
    //This is the same as the preceding case.
    //The implementation does explicitly what the compiler does under the hood.
    cc = getVal3();

    //================== CASE 4 ==================
    //This is ok, but you still can not modify the string,
    //despite that the syntax checks will pass if you do.
    //There is no need to deallocate anything.
    c = getVal4();
    //c[0] = 'x'; is undefined behavior.

    //================== CASE 5 ==================
    //This is the same as the preceding case.
    //The implementation does explicitly what the compiler does under the hood.
    c = getVal5();

    //================== CASE 6 ==================
    //This is ok, and there is no need to deallocate anything.
    c = getVal6();
    //However, when you do this...
    char* c2 = getVal6();
    c[0] = 'x';
    c2[0] = 'y';
    //Now c[0] == c2[0], and both are 'y'.
    //This may not be what the user expects.

    //================== CASE 7 ==================
    //This is ok, and there is no need to deallocate anything.
    string_struct c_struct = getVal7();
    string_struct c2_struct = getVal7();
    c_struct.value[0] = 'x';
    c2_struct.value[0] = 'y';
    //Now c_struct.value[0] == 'x' and c2_struct.value[0] == 'y'
    //as is probably expected by the user.
    //If you want to copy by value, then
    c2_struct = c_struct;
    //Now c_struct.value[0] == 'x' and c2_struct.value[0] == 'x'
    //However..
    strcpy(c_struct.value, "0123456789");
    //is not ok. Basically, the size of the character array in
    //c_struct.value is restricted apriori.

    //================== CASE 8 ==================
    //This is ok, but you must deallocate the memory manually.
    c = getVal8();
    delete[] c;
    //Without the above line you have a memory leak.
    //delete c; would be a bug (no []).

    //================== CASE 9 ==================
    //Erroneous, deleted

    //================== CASE 10 ==================
    //This is ok, and you don't need to deallocate anything.
    shared_array<char> c_shared = getVal10();
    shared_array<char> c2_shared = c_shared;
    //No problem here.
    //But if you want to copy by value, you need to use:
    strcpy(c2_shared.get(), c_shared.get());
    //But this is bug, because there is no guarantee that
    //the target buffer has enough space for the source string.

    //================== CASE 11 ==================
    //This is ok, and you don't need to deallocate anything.
    string c_string = getVal11();
    //You can copy by value. Usually it is implemented with the
    //copy-on-write idiom, meaning that the actually duplication
    //is deferred until one of the two string is modified.
    string c2_string = c_string;
    c_string[0] = 'x';
    //^ Usually c_string recieves a duplicate of c_string at this point
    //and the modification is executed.
    c2_string[0] = 'y';
    //Currently c_string[0] == 'x' and c2_string[0] == 'y'.
    c2_string = "12345";
    //^ This is no problem now, bwcause the string class automatically
    //allocates the necessary space unlike the string_struct solution.
}

Last edited on

chrisben (99)

Hi simeonz,

Thank you very much for putting all together. It is a wonderful post and clarify many of my questions which I did not ask in my original post. I will play with it to better understand this string/char* thing.
Have a good weekend
Chris

PS Sorry for the late reply. I just saw it.

simeonz (490)

Actually, I have been very dumb and gave you some lousy instructions regarding auto_ptr and shared_ptr.

They are not suitable for array types and I have used them here exactly for that. I don't know where my head was. You must use shared_array instead of shared_ptr. There is unfortunately no working alternative for auto_ptr, except dynamically allocating vector and pointing auto_ptr to it, but this is bad performance wise.

I will correct the example and remove the auto_ptr case entirely.
Sorry. I seem to be in the habit of giving bad advice recently.

chrisben (99)

Hi Simenonz,

Is there any section of the book or online reference I can read with your sample? For example, on getVal4, I think

"since the memory you are pointing to get destroyed after you return from the function"

same as getVal1.

Thanks

simeonz (490)

I am not sure which book covers exactly those things. There is mention in the literature, but my info on some aspects is built from variety of sources, including courses I had, books I read quite a while ago, etc.

Here is getVal1

char* getVal1()
{
    char c[] = "1234";
    return c;
}

What the compiler does under the hood is to allocate some static storage that will be filled with the initialization string. Then, when the function is called, space for the c array is locally allocated on the stack and this space is filled with copy of this initialization string.
It looks like this after compilation:

char* getVal1_compiler_output()
{
    static const char c_init[] = "1234";
    char c[sizeof(c_init)];
    memcpy(c, c_init, sizeof(c_init));
    return c;
}

The problem is part syntax, part semantics. The syntax part is that when the name of some array (in this case c) is used in a context where a pointer is required, the name is converted to pointer to the first array element. So, you return pointer to the storage allocated for the array c on the stack. This storage is destroyed when the function returns, but the pointer to the storage is returned anyways. So, now you have a dangling pointer that you can use to access some memory that can be overwritten by the next function call. This is the bug. You access memory that was re-assigned to some other object that lives there now, and stores who-knows-what there.

About getVal2:

const char* getVal2()
{
    const char* c = "1234";
    return c;
}

Again, under the hood, the compiler uses some static memory for the initializing string. But the string is not used to fill the contents of some array allocated on the stack. It is simply used to point some pointer to it:

const char* getVal3()
{
    static const char c_init[] = "1234";
    const char* c = c_init;
    return c;
}

Character arrays are the objects themselves. Character pointers are just used to refer to those objects. In getVal1, the static array is used as prototype for all stack objects. It is used as their initial filler. In getVal2, the static array is directly referred to using a pointer. It is not a prototype value then, it is something the pointer simply refers to. Because the static array object will never "die" (because it has static storage duration and will exist until the program terminates), the returned pointer will remain valid. It is copy of the local pointer, but this is irrelevant. Once you copy a pointer, you can destroy the original pointer, so long as the pointed object remains alive. This is not the case with getVal1, where the returned pointer is to the stack object.

EDIT: In resume of the above paragraph. getVal1 returns pointer to the first element of the array c allocated on the stack. c is destroyed and the pointer is dangling. getVal2 returns pointer to the first element of c_init. c_init is not destroyed and the pointer can be used as long as it has to be.

Regards

Last edited on

Topic archived. No new replies allowed.