From my study of templates I learned that the operation push_back could not be done in a way that only adds a new element to the dynamically allocated array, but it allocates a new array with a size 1 unit bigger than the old one (before the push_back call), and then defines the last element to be what we have push_backed.
I'm not so sure whether STL library does it this way for vectors. Because if that's the case, then push_back is a very crappy operation for memory. I deal with huge amounts of data that exceed gigabytes, which in other words mean that my huge array would be copied like millions, if not billions, of times in this scenario, and worst of all, because I'm using vectors, i.e. contiguous memory array, I'm also wasting a lot of time making my program find the sufficient memory location for my contiguous array!!!
Have I understood the story of push_back right? and should I replace my huge push_back series with a single resize and a series of elements definitions? or are they really equivalent in some way?
Thank you for reading, and sorry for the long story :D
Please let me know what you think about this!
push_back:
Adds a new element at the end of the vector, after its current last element. The content of this new element is initialized to a copy of x.
This effectively increases the vector size by one, which causes a reallocation of the internal allocated storage if the vector size was equal to the vector capacity before the call. Reallocations invalidate all previously obtained iterators, references and pointers.
so i guess, if you nearly know how many elements you want to push_back in the vector, you could reserve that amount of space.
Sounds like std::vector isn't the right data structure for this situation...why not use a linked data structure such as std::list instead? It's slower for random-access but you can easily add more data to it without the overhead of reallocating memory when the size needs to expand.
if you know how many elements you are going to append, you should use insert or resize, if you have to add one element at the time keep using push_back.
Notice that vectors have capacity and size, the buffer gets copied only when the new size would exceed the capacity.
If you have to perform many insertions, maybe a vector is not the best structure for this
1) What does that have to do with templates?
2) Nope, you misunderstood. The vector class allocates a buffer that is larger than the actual size, you can get the size of that buffer with the capacity() method. If you push a new element, it is placed within that buffer, and the size is changed. An actual reallocation only happens when the buffer runs full (IIRC vector always allocates two times the previous capacity if the buffer becomes full). You can also manually tell vector what capacity you need with the reserve method.
@Mathes: yes I know how many elements I need to push_back, but you know push_back is more subtle and is a nice way to overcome segmentation faults. That's why I'm asking about it.
@darkestfright: I need a contiguous list in memory, because after creating the data, I use a file-stream function to copy all the data to the HDD, so it's way faster to have a contiguous list in memory rather than looking for every element with an iterator like it's in std::list.
@hanst99: it has to do with templates because I understood what vectors are and how they work after learning templates. What kind of streaming do you mean? I'm creating a signal which contains something like 500M doubles (and sometimes more) and then do Fourier transform on it. Could streaming help in such a situation?
So from what you quoted, Mathes and hanst99, I guess the story I mentioned in my first post happens only if the next elements is already reserved by memory which makes the program in a need for reallocation, which in other words mean, whenever I deal with huge amounts of data, I have to use resize rather than push_back! I got it right, didn't I? and please tell me if there's a better way to do what I'm doing.
But deque isn't contiguous in memory. Meaning if I want to write the memory block to a file I'll have to run a loop and write bunches of memory rather than one complete stack :-).
Well, you can't have your cake and eat it to. You're going to have to make a choice, do you want fast random access and contiguous memory locality, or fast insertions with no reallocation? Weigh the pros and cons of using each type of data structure, and go with the one that offers the most important advantages for the current situation.
If you know how many elements you need, then use an std::vector, call vector::reserve before adding any elements to it and you're done. If you don't know how many elements, then a linked data structure would be more suitable (imo) because you can add as many elements as you want without needing to reallocate data. Setting up a loop with an iterator, and the following iteration is orders of magnitude faster than writing gigabytes of data to disc, the performance difference is moot.
Get it to work first, optimize later. You'll often be surprised how often small optimizations can be a complete waste of time.
Actually it's working now on the small scale, but for huge amounts of data I'm getting memory problems, the the program behaves so slowly at the end!!
So like I said before, I create the signal (and save the data on every signal step or time sample), Fourier transform it, and then write the result to the HDD. For the Fourier transform I'm using GSL, which takes a C array, which means the data has to be contiguous in memory, I have no choice!! And even after that I write it all with a single fwrite(). So I guess I'm taking the approach of vector::resize or vector::reserve. It makes more sense!!!
Actually, by coincidence, I just read about Valgrind yesterday in the book Professional C++ by Solter and Kleper (it's a great book, btw).
I never use new and delete, but auto_ptr to reserve memory. I guess memory problems make sense when I create 500M doubles + analysis on an 8 GB RAM computer.