a basic vector<> class and memove

Nov 11, 2009 at 5:39am
When you erase data from a custom made vector in the middle of the buffer, when is it safe to assume you can memmove instead of looping and doing a operator=() ?
i'm pretty sure it's only POD-structs but i'm not 100% sure.

The question i'm trying to ask is why can't it work on every classes even with a virtual table ?

Why is it NECESSARY to call the copy constructor instead of just moving the data.

What could possibly be in the class that absolutely needs a copy constructor and cannot be moved.

I mean, when you push_back and the vector must be enlarged, they just realloc right ? they don't malloc a new one and copy construct all the elements.
Nov 11, 2009 at 6:34am
What could possibly be in the class that absolutely needs a copy constructor and cannot be moved.


Lots of things. Reference counts and allocated buffers come to mind first. But there can be other things, like pointers to member buffers, iterators, etc.

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class A
{
public:
  // default ctor
  A() : buf( new int[20] ) { }
  
  // default dtor
  ~A() { delete[] buf; }

  // copy ctor
  A(const A& a) : buf(new int[20])
  { *this = a; }

  // assignment operator
  A& operator = (const A& a)
  { /*... copy a.buf contents to this->buf here -- too lazy to write code*/ }

private:
  int* buf;
};


As you can see --- the copy constructor here is responsible for allocating a new buffer. If you simply memcpy'd or memmove'd the data instead of calling the copy ctor, two objects would point to the same buffer, and they'd both try to delete the same buffer!

The assignment operator is similar. Simply copying the data would not copy the buffers themselves, it would just copy the pointer (so you'd have the same problem -- two objects would have the same pointer, and they'd both try to delete the same buffer on their destruction).


EDIT:


I mean, when you push_back and the vector must be enlarged, they just realloc right ? they don't malloc a new one and copy construct all the elements.


If the memory buffer needs to be enlarged, then vector will create all new objects by using the copy ctor, then destroy the originals. It can have quite significant overhead if you are irresponsible with it.

Same thing when you remove elements in the middle of the vector. It has to shift all of the other elements forward to close the gap that's created -- and to do that it has to copy the objects and destroy all the originals. This is why it's so undesirable to do that with a vector.
Last edited on Nov 11, 2009 at 6:37am
Nov 11, 2009 at 3:03pm
As you can see --- the copy constructor here is responsible for allocating a new buffer. If you simply memcpy'd or memmove'd the data instead of calling the copy ctor, two objects would point to the same buffer, and they'd both try to delete the same buffer!


no because the other object doesn't exist anymore.

if i memove everything to the left because one object was destructed in the middle, all the objects will be moved to the left and there won't be any duplicates.

Nov 11, 2009 at 3:40pm
I suppose you're right if you don't actually destruct the original object. Touche.

Okay then, here's another one (albeit less common):

1
2
3
4
5
6
7
8
9
10
11
//assume 'Page' is some struct
class DontMoveMe
{
public:
  Page& GetPage() { return *curpage; }
  void GoToPage(int page) { curpage = &pages[ page ]; }

private:
  Page* curpage;
  Page  pages[10];
};


The general idea is that 'curpage' points to a memory location within 'this'. If 'this' changes (which would happen if the object is moved), then 'curpage' becomes a bad pointer.

True this model isn't very common, but I've done it in the past.
Nov 11, 2009 at 4:00pm
Alright, finally a good example :P
I've posted this on multiple boards and it's the first real good answer.

I was asking myself, how can you know the <typename T> is a POD type ?
Nov 11, 2009 at 4:45pm
after some thought, i think i will do a vector_fast class or something that would always use the memmove and realloc, since it's always what i want to do.
Nov 11, 2009 at 5:59pm
If your struct contains nothing more than POD type elements, then why would you think
that vector<>'s copy construction paradigm is any slower than memmove/memcpy?
Nov 11, 2009 at 6:04pm
memmove(m_pData + nIndex + p_nNumber, m_pData + nIndex, sizeof(node)*(m_nSize - nIndex));

VS

for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
new(m_pData[i]) node(m_pData[i+1]);
m_pData[i+1].~node();
}

yeah right, NOT SLOWER IN ANY WAY.
Nov 11, 2009 at 7:22pm
COULD YOU SHOW ME DISASSEMBLY TO PROVE THAT IT IS SLOWER.
Nov 11, 2009 at 7:26pm
So for simple POD structs, there shouldn't need to be any destructor, so your above code reduces to:

1
2
3
4
for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
new(m_pData[i]) node(m_pData[i+1]);
}


Now, placement new simply returns that address and does not invoke the memory manager at all, so
effectively the above is equivalent to:

1
2
3
4
for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
   T node(m_pData[i+1]);  // simply copy construction
}


The default copy constructor is a member-wise copy, which for POD types is a bit-wise copy.

Therefore, the runtime efficiency, in terms of memory accesses, of the above algorithm vs. memmove is
identical.

Nov 12, 2009 at 3:11am
To expand on what jsmith is saying...

keep in mind that vector is a class template which means that an entirely different class is created for each T. Each of those classes are compiled individually. And many (most/all) of their members are even inlined.

It's true that vector will call a copy ctor and dtor to move an object... for POD types, after compiler optimizations, these equate to the exact same thing as just moving the data over, just as jsmith illustrated.


In any event, this thread doesn't point out a shortcoming of vector, per se. It more points out a way to misuse vector. Typically for performance reasons, you don't want to have vectors of full objects. Most of the time you're better off having vectors of pointers to dyanmically created objects. That way your vector is virtually always dealing with basic types.
Nov 12, 2009 at 1:00pm
std::vector<> is almost never a better choice than std::deque<>. std::vector<> should be used ONLY when
the elements must be in contiguous memory. This is often the case when interfacing with legacy
C functions like read() and write(). But apart from that, it is fairly rare.

std::deque<> is implemented essentially as a linked list of vectors (but not using the std::vector<> type).
This means that inserts and removals from the front and middle are far less expensive than std::vector<>
because relatively few elements need to be "shifted" compared to std::vector<> (in which at a minimum
all elements to the "right" of the insert/removal point need to be moved, if not the entire container if the
buffer needs to be reallocated).

So perhaps OP could consider using std::deque<> instead. I would strongly advocate that over vectors
of pointers. (Or if you are going that route, at least use boost::ptr_vector<> to manage the memory
for you).


Nov 12, 2009 at 3:39pm
Not that I disagree, but vector has better performance if you need lots of random access and won't be inserting/removing elements much/at all.
Topic archived. No new replies allowed.