Copying Strings

I have 3 questions here on the same topic.

1) Why can't I do the following?

int main() {
char a [] = "hello";
char b [] = a;
}

Please understand I am not asking how to copy a string, I understand there is strcpy for that. But I am asking why does this not work? What is the theory behind it.


2) how come in the following code I can do "student1.name = n;"?

class MITStudent {
public:
int studentID;
char *name;
MITStudent() {
studentID = 0;
name = "";
}
};

int main() {
MITStudent student1;
student1.studentID = 98;
char n[] = "foo";
student1.name = n;
MITStudent student2 = student1;
student2.name[0] = 'b';
cout << student1.name; // boo
}

3) How come in the code above when I change student 2's name it changes student 1's name?

I'd appreciate a detailed answer so I can truly understand what's going on in the background.

Thank you.
Last edited on
I think that the reason has historical roots. In C arrays may be initialized by constant expressions. Moreover there was no such qualificator as const in early C. Even when the qualificator const was introduced it did not have the same semantic as in C++. Take into account that even till now string literals in C have type char[] not const char[] as in C++.
Last edited on
1) Why can't I do the following?
char b [] = a;

A C-style array cannot be copy-initialized from another C-style array for the same reason it cannot be copy-assigned: it was to deliberately break compatibility with B, where such statements were a common programming idiom (array rehoming) which would never work in C.

Note that if you wrap arrays in structs, they become copyable (B had no structs, there is no way a B programmer would write such code)

1
2
3
4
5
6
7
struct S {
    char a[6];
};
int main() {
    S a = {"hello"}; 
    S b = a;
}


2) how come in the following code I can do "student1.name = n;"?
3) How come in the code above when I change student 2's name it changes student 1's name?


name is a pointer, that's a whole different (although related) ball game.

The line
student1.name = n;
creats a pointer to n[0] and copies that pointer into name.

The line MITStudent student2 = student1; makes another copy of that pointer, student2.name - it's still pointing at the same n[0].

The line student2.name[0] = 'b'; changes n[0]
In reference to your response to #1:
- I am still not clear on #1. Can you make your explanation simpler please.
- Also, I've never heard of B, and what is copy-initialized and copy-assigned?
- What is the difference between char a[] = "blah" and char *a = "blah"?

In reference to your response to #2 &#3:
- So what your saying is since the student's name is a pointer, it points to the address of n. Then when I say MITStudent student2 = student1, since the object's name parameter is a pointer, I am simply equating the pointers, hence when I change one I change them all. Correct?
I've never heard of B,

That was the language before C. It is extinct, don't worry about it. It's just that the reason the line you asked about doesn't compile is this old B to C migration.

What is the difference between char a[] = "blah" and char *a = "blah"?

One creates a C-style array of five char called "a" and populates it with the characters 'b', 'l', 'a', 'h', and '\0'. This array will be destroyed at the next closing brace.

The other creates a nameless read-only character array of six char at program startup (which is only destroyed at program termination), then creates a pointer called "a" and stores the address of the first character (the character 'b') in it. Incidentally, this is an error in modern C++, the correct syntax is const char *a = "blah";

Note that in C++, you should be using std::string a = "blah"; (but then you wouldn't ever have a chance to learn about B!)

since the student's name is a pointer, it points to the address of n.

It points to the address of n[0], the first character of your character array. But yes, otherwise it is about right: when you change the value pointed by one pointer, you can observe the change through another pointer to the same char object.
Last edited on
Great explanation!

I'm still not clear on that first question though, why can't I say "char b [] = a"?

1
2
3
4
5
6
7
int main() {
char a [] = "hello";  // This creates 6 characters in memory and names them a
char b [] = a;          /* Doesn't this also create six characters and copy them?
                                   The only reason why it doesn't work that I could think of
                                    is that a only points to a[0]. But then wouldn't b at
                                    least be equal to 'h'? */    
}



Also why do I need the "const" before char *a = "blah"?

I haven't learnt about strings in c++ yet ... the MIT course I'm looking at sticks closer to C first, and yes I wouldn't have known about B :P

Thanks!
char b [] = a; /* Doesn't this also create six characters and copy them? */
It does this with C++ strings, C++ arrays, and C arrays inside structs, and it *would* do that for raw C arrays, except that the creators of C decided to block this specific case for the sake of the B programmers.

why do I need the "const" before char *a = "blah"

Because that 'b' is a const char. A pointer to it is a const char*

I haven't learnt about strings in c++ yet ... the MIT course I'm looking at sticks closer to C first

Sounds like that "Introduction to C++" nonsense that some undergrads put together for opencourseware: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-096-introduction-to-c-january-iap-2011/index.htm It was a good exercise for the students that made it, but it's useless as teaching material.
Last edited on
why do I need the "const" before char *a = "blah"

Because, whenever you use a literal in your code, it put's it somewhere in the RAM of your computer. If you would just change that mememory like crazy, somethign bad would happen. Try this code:
1
2
3
4
5
#include <iostream>
int main()
{
	for (int i=0; i<10000; i++) std::cout<<"Hello World!"[i];
}

It will probably have alternating chunks of random data and C-string names of C++ stuff like constructors, destructors, vtables, lambdas, keywords, and other library stuff.
@Cubbi:
1) so are you saying that if a and b were integers this would work? Ie
1
2
int a [] = {1, 2, 3}; 
char b [] = a; 


2) are you saying b is a constant char because of what @viliml said?

3) That is exactly where I'm learning from, I am very open to hearing of other places I should learn from.

Thanks guys!
1) so are you saying that if a and b were integers this would work?

No, i am saying that if a and b were strings, C++ arrays, vectors, or pretty much anything that's not a C-style array, it would work:

1
2
3
4
5
6
#include <vector>
int main()
{
    std::vector<int> a = {1, 2, 3}; 
    std::vector<int> b = a; 
}

online demo: http://ideone.com/zZK37

1
2
3
4
5
6
#include <array>
int main()
{
    std::array<int, 3> a = {1, 2, 3}; 
    std::array<int, 3> b = a; 
}

online demo: http://ideone.com/ZY68E

1
2
3
4
5
6
#include <string>
int main()
{
    std::string a = "blah"; 
    std::string b = a; 
}

online demo: http://ideone.com/vwMRt

etc.

2) are you saying b is a constant char because of what @viliml said?

Not really. I am saying that b is a constant char because that's what happens when you use double quotes in source code: a read-only array is created. Viliml is pointing out some of the possible repercussions of modifying memory around the array on platforms where the read-only property is not supported at the OS level.

3) That is exactly where I'm learning from, I am very open to hearing of other places I should learn from.

There are a few decent books: "Accelerated C++", "Programming: Principles and Practice using C++", "C++ Primer" (I'd preorder the new edition though, if it's not in stores yet).
Last edited on
1) I think I get it.

2) Okay I just want to clarify a constant char means the value stored in the variable cannot be changed. Correct? And the reason why it cannot be changed is because it is in read only memory. Correct? Well if this is the case then wouldn't the second line of code bellow work, because it is not in RD-Only memory? Also I don't understand what happens when you use double quotes?

1
2
char *a = "hello";
char b[] = "yello";


3) Thanks!
Assuming your code is inside a function:

The line char *a = "hello"; does this:
1. in read-only section of the program image, the characters 'h', 'e', 'l', 'l', 'o', '\0' are stored, say, at .rodata offset 0 <- this is what the double quotes do

2. at the line of the program where you wrote that line, a pointer-to-char is created (typically, in a CPU register) and the value .rodata+0 is stored in it.

The line char b[] = "yello"; does this:

1. in read-only section of the program image, the characters 'y', 'e', 'l', 'l', 'o', '\0' are stored, say, at .rodata offset 6

2. at the line of the program where you wrote that line, a 6-character array is allocated on stack, then a loop is compiled that copies the six characters from read-only memory (.rodata+6 to .rodata+12) to the six locations on stack.

as a result, a[0] .. a[5] are read-only locations, while b[0] .. b[5] are writable
THAT WAS SUCH A GOOD EXPLANATION! THANK YOU SO MUCH!

I have two more questions if you don't mind...

1) Is this loop that copies the data to the stack executed at run time or during compile time?
2) Stack memory is last in first out right. Well wouldn't that be an issue? I mean if I have 5 character arrays and I wanted to access the 3rd one that would be a problem. Am I not understanding stack memory?
Is this loop that copies the data to the stack executed at run time or during compile time?

At run time, it needs to populate the local array every time the function that holds that line is called (unless the array is static)

Stack memory is last in first out right. Well wouldn't that be an issue?

It means that local variables are destroyed in the order opposite to their construction. If you declare three arrays in this order: a1, a2, and a3, then at the end of the function, a3 is gone, then a2, then a1. It doesn't matter unless you deal with class objects whose destructors access other objects. Until the end of the function, you can access all of that array. Once the function ended, all of it is gone.
Last edited on
Topic archived. No new replies allowed.