vector::push_back vs vector::resize

Mar 26, 2011 at 2:19pm
Hello guys :-),

From my study of templates I learned that the operation push_back could not be done in a way that only adds a new element to the dynamically allocated array, but it allocates a new array with a size 1 unit bigger than the old one (before the push_back call), and then defines the last element to be what we have push_backed.

I'm not so sure whether STL library does it this way for vectors. Because if that's the case, then push_back is a very crappy operation for memory. I deal with huge amounts of data that exceed gigabytes, which in other words mean that my huge array would be copied like millions, if not billions, of times in this scenario, and worst of all, because I'm using vectors, i.e. contiguous memory array, I'm also wasting a lot of time making my program find the sufficient memory location for my contiguous array!!!

Have I understood the story of push_back right? and should I replace my huge push_back series with a single resize and a series of elements definitions? or are they really equivalent in some way?

Thank you for reading, and sorry for the long story :D
Please let me know what you think about this!
Mar 26, 2011 at 2:31pm
push_back:
Adds a new element at the end of the vector, after its current last element. The content of this new element is initialized to a copy of x.

This effectively increases the vector size by one, which causes a reallocation of the internal allocated storage if the vector size was equal to the vector capacity before the call. Reallocations invalidate all previously obtained iterators, references and pointers.


so i guess, if you nearly know how many elements you want to push_back in the vector, you could reserve that amount of space.
Mar 26, 2011 at 2:33pm
closed account (1yR4jE8b)
Sounds like std::vector isn't the right data structure for this situation...why not use a linked data structure such as std::list instead? It's slower for random-access but you can easily add more data to it without the overhead of reallocating memory when the size needs to expand.
Mar 26, 2011 at 2:36pm
if you know how many elements you are going to append, you should use insert or resize, if you have to add one element at the time keep using push_back.
Notice that vectors have capacity and size, the buffer gets copied only when the new size would exceed the capacity.

If you have to perform many insertions, maybe a vector is not the best structure for this
Mar 26, 2011 at 2:36pm
1) What does that have to do with templates?
2) Nope, you misunderstood. The vector class allocates a buffer that is larger than the actual size, you can get the size of that buffer with the capacity() method. If you push a new element, it is placed within that buffer, and the size is changed. An actual reallocation only happens when the buffer runs full (IIRC vector always allocates two times the previous capacity if the buffer becomes full). You can also manually tell vector what capacity you need with the reserve method.

See here for further details:

http://www.cplusplus.com/reference/stl/vector/

PS: There is something wrong with your program if you need to allocate gigabytes of data on the heap. Wouldn't streaming be a better choice here?
Last edited on Mar 26, 2011 at 2:37pm
Mar 26, 2011 at 2:50pm
Thanks guys for your fast answers :-)

@Mathes: yes I know how many elements I need to push_back, but you know push_back is more subtle and is a nice way to overcome segmentation faults. That's why I'm asking about it.

@darkestfright: I need a contiguous list in memory, because after creating the data, I use a file-stream function to copy all the data to the HDD, so it's way faster to have a contiguous list in memory rather than looking for every element with an iterator like it's in std::list.

@hanst99: it has to do with templates because I understood what vectors are and how they work after learning templates. What kind of streaming do you mean? I'm creating a signal which contains something like 500M doubles (and sometimes more) and then do Fourier transform on it. Could streaming help in such a situation?

So from what you quoted, Mathes and hanst99, I guess the story I mentioned in my first post happens only if the next elements is already reserved by memory which makes the program in a need for reallocation, which in other words mean, whenever I deal with huge amounts of data, I have to use resize rather than push_back! I got it right, didn't I? and please tell me if there's a better way to do what I'm doing.
Mar 26, 2011 at 3:13pm
You can use a std::deque if you have to add many items at the end.
It has linear complexity insertion and it's randomly accessible
Mar 26, 2011 at 4:18pm
Thanks for the answer Bazzy!

But deque isn't contiguous in memory. Meaning if I want to write the memory block to a file I'll have to run a loop and write bunches of memory rather than one complete stack :-).
Mar 26, 2011 at 4:41pm
closed account (1yR4jE8b)
Well, you can't have your cake and eat it to. You're going to have to make a choice, do you want fast random access and contiguous memory locality, or fast insertions with no reallocation? Weigh the pros and cons of using each type of data structure, and go with the one that offers the most important advantages for the current situation.

If you know how many elements you need, then use an std::vector, call vector::reserve before adding any elements to it and you're done. If you don't know how many elements, then a linked data structure would be more suitable (imo) because you can add as many elements as you want without needing to reallocate data. Setting up a loop with an iterator, and the following iteration is orders of magnitude faster than writing gigabytes of data to disc, the performance difference is moot.

Get it to work first, optimize later. You'll often be surprised how often small optimizations can be a complete waste of time.
Last edited on Mar 26, 2011 at 4:46pm
Mar 26, 2011 at 7:57pm
Thanks for the answer, buddy :-).

Actually it's working now on the small scale, but for huge amounts of data I'm getting memory problems, the the program behaves so slowly at the end!!

So like I said before, I create the signal (and save the data on every signal step or time sample), Fourier transform it, and then write the result to the HDD. For the Fourier transform I'm using GSL, which takes a C array, which means the data has to be contiguous in memory, I have no choice!! And even after that I write it all with a single fwrite(). So I guess I'm taking the approach of vector::resize or vector::reserve. It makes more sense!!!

Thank you all guys :-) I guess I got it!
Mar 26, 2011 at 8:59pm
closed account (1yR4jE8b)
If you're having memory problems, you could try using Valgrind to detect memory leaks and other kinds of memory related errors.

If you're using Visual Studio, I know it has something similar to Valgrind but have no idea how to use it.
Mar 27, 2011 at 10:44am
Actually, by coincidence, I just read about Valgrind yesterday in the book Professional C++ by Solter and Kleper (it's a great book, btw).

I never use new and delete, but auto_ptr to reserve memory. I guess memory problems make sense when I create 500M doubles + analysis on an 8 GB RAM computer.

Thanks for the advice :-).
Topic archived. No new replies allowed.