a basic vector<> class and memove

Forum

Forum
General C++ Programming
a basic vector<> class and memove

a basic vector<> class and memove

When you erase data from a custom made vector in the middle of the buffer, when is it safe to assume you can memmove instead of looping and doing a operator=() ?
i'm pretty sure it's only POD-structs but i'm not 100% sure.

The question i'm trying to ask is why can't it work on every classes even with a virtual table ?

Why is it NECESSARY to call the copy constructor instead of just moving the data.

What could possibly be in the class that absolutely needs a copy constructor and cannot be moved.

I mean, when you push_back and the vector must be enlarged, they just realloc right ? they don't malloc a new one and copy construct all the elements.

Disch (13742)

What could possibly be in the class that absolutely needs a copy constructor and cannot be moved.

Lots of things. Reference counts and allocated buffers come to mind first. But there can be other things, like pointers to member buffers, iterators, etc.

Consider the following:

class A
{
public:
  // default ctor
  A() : buf( new int[20] ) { }
  
  // default dtor
  ~A() { delete[] buf; }

  // copy ctor
  A(const A& a) : buf(new int[20])
  { *this = a; }

  // assignment operator
  A& operator = (const A& a)
  { /*... copy a.buf contents to this->buf here -- too lazy to write code*/ }

private:
  int* buf;
};

As you can see --- the copy constructor here is responsible for allocating a new buffer. If you simply memcpy'd or memmove'd the data instead of calling the copy ctor, two objects would point to the same buffer, and they'd both try to delete the same buffer!

The assignment operator is similar. Simply copying the data would not copy the buffers themselves, it would just copy the pointer (so you'd have the same problem -- two objects would have the same pointer, and they'd both try to delete the same buffer on their destruction).

EDIT:

I mean, when you push_back and the vector must be enlarged, they just realloc right ? they don't malloc a new one and copy construct all the elements.

If the memory buffer needs to be enlarged, then vector will create all new objects by using the copy ctor, then destroy the originals. It can have quite significant overhead if you are irresponsible with it.

Same thing when you remove elements in the middle of the vector. It has to shift all of the other elements forward to close the gap that's created -- and to do that it has to copy the objects and destroy all the originals. This is why it's so undesirable to do that with a vector.

Last edited on

neken (5)

no because the other object doesn't exist anymore.

if i memove everything to the left because one object was destructed in the middle, all the objects will be moved to the left and there won't be any duplicates.

Disch (13742)

I suppose you're right if you don't actually destruct the original object. Touche.

Okay then, here's another one (albeit less common):

//assume 'Page' is some struct
class DontMoveMe
{
public:
  Page& GetPage() { return *curpage; }
  void GoToPage(int page) { curpage = &pages[ page ]; }

private:
  Page* curpage;
  Page  pages[10];
};

The general idea is that 'curpage' points to a memory location within 'this'. If 'this' changes (which would happen if the object is moved), then 'curpage' becomes a bad pointer.

True this model isn't very common, but I've done it in the past.

neken (5)

Alright, finally a good example :P
I've posted this on multiple boards and it's the first real good answer.

I was asking myself, how can you know the <typename T> is a POD type ?

neken (5)

after some thought, i think i will do a vector_fast class or something that would always use the memmove and realloc, since it's always what i want to do.

jsmith (5804)

If your struct contains nothing more than POD type elements, then why would you think
that vector<>'s copy construction paradigm is any slower than memmove/memcpy?

neken (5)

memmove(m_pData + nIndex + p_nNumber, m_pData + nIndex, sizeof(node)*(m_nSize - nIndex));

VS

for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
new(m_pData[i]) node(m_pData[i+1]);
m_pData[i+1].~node();
}

yeah right, NOT SLOWER IN ANY WAY.

jsmith (5804)

COULD YOU SHOW ME DISASSEMBLY TO PROVE THAT IT IS SLOWER.

jsmith (5804)

So for simple POD structs, there shouldn't need to be any destructor, so your above code reduces to:

for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
new(m_pData[i]) node(m_pData[i+1]);
}

Now, placement new simply returns that address and does not invoke the memory manager at all, so
effectively the above is equivalent to:

for(int i = nIndex; i != nIndex + nNumberOfElemToMove; ++i)
{
   T node(m_pData[i+1]);  // simply copy construction
}

The default copy constructor is a member-wise copy, which for POD types is a bit-wise copy.

Therefore, the runtime efficiency, in terms of memory accesses, of the above algorithm vs. memmove is
identical.

Disch (13742)

To expand on what jsmith is saying...

keep in mind that vector is a class template which means that an entirely different class is created for each T. Each of those classes are compiled individually. And many (most/all) of their members are even inlined.

It's true that vector will call a copy ctor and dtor to move an object... for POD types, after compiler optimizations, these equate to the exact same thing as just moving the data over, just as jsmith illustrated.

In any event, this thread doesn't point out a shortcoming of vector, per se. It more points out a way to misuse vector. Typically for performance reasons, you don't want to have vectors of full objects. Most of the time you're better off having vectors of pointers to dyanmically created objects. That way your vector is virtually always dealing with basic types.

jsmith (5804)

std::vector<> is almost never a better choice than std::deque<>. std::vector<> should be used ONLY when
the elements must be in contiguous memory. This is often the case when interfacing with legacy
C functions like read() and write(). But apart from that, it is fairly rare.

std::deque<> is implemented essentially as a linked list of vectors (but not using the std::vector<> type).
This means that inserts and removals from the front and middle are far less expensive than std::vector<>
because relatively few elements need to be "shifted" compared to std::vector<> (in which at a minimum
all elements to the "right" of the insert/removal point need to be moved, if not the entire container if the
buffer needs to be reallocated).

So perhaps OP could consider using std::deque<> instead. I would strongly advocate that over vectors
of pointers. (Or if you are going that route, at least use boost::ptr_vector<> to manage the memory
for you).

Disch (13742)

Not that I disagree, but vector has better performance if you need lots of random access and won't be inserting/removing elements much/at all.

Topic archived. No new replies allowed.

C++

Forum

a basic vector<> class and memove