Resetting vector/array.

Hello,

In a piece of code I'm working on, I end up resetting a "large-ish" vector. I'm wondering what the best way to do this would be. Looping over index? Over iterators? Or copying a new vector?

One of the vectors is a vector of bools, which is a specialized template. Does it change the answer?

I tried testing it, but I'm having troubles finding a proper test that isn't skewed by some behind-the-scenes optimizations.

[edit]

Probably taking it a bit too far, but in the case of the bools: if I reset to "true", would it be faster to ' = true', or to OR with 1?
Last edited on
Since C++03, the elements of vector are guaranteed to be contiguous, so some kind of memset would do the job.

That, of course, only opens the next can of worms; what's the fastest memset, and can I beat it if I optimise for my hardware?
Ah, just found out there's a built-in vector function 'assign'. I'm guessing this'll be optimized enough to make sure I don't have to worry about it?
Probably taking it a bit too far, but in the case of the bools: if I reset to "true", would it be faster to ' = true', or to OR with 1?

Assigning true just calls, like, a memcpy.
Where Or will FIRST have an or operation, then will memcpy it.
Or am I wrong?
What types are the vectors? Are they plain data that can be happily zeroed, or are they objects of some class?
All of them will be of primitive types, generally bools or (unsigned) integers.
I always like playing with this sort of thing, so here's some code. Memset comes out looking pretty good :)

Obviously the usual caveats apply with this sort of shonky timing code - shoud run thousands of trials, proper profiling tools, etc etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <vector>
#include <ctime>

using namespace std;

int main()
{

  const int SIZE_OF_VECTOR = 20000000;
  
  vector<int> a;
  a.resize(SIZE_OF_VECTOR);
  

  clock_t start, end;
  double cpu_time_used;
  

  start = clock();
  memset(&a[0], 0, SIZE_OF_VECTOR * sizeof(int));
  end = clock();
  cpu_time_used = ((double) (end-start));
  cout << "CPU time used by memset = " <<cpu_time_used << endl;
  

  start = clock();
  a.assign(SIZE_OF_VECTOR, 0);
  end = clock();
  cpu_time_used = ((double) (end-start));
  cout << "CPU time used by assign = " <<cpu_time_used << endl;

  start = clock();
  for (int eger=0; eger < SIZE_OF_VECTOR; ++eger)
  {
    a[eger] = 0;
  }
  end = clock();
  cpu_time_used = ((double) (end-start));
  cout << "CPU time used by for loop = " <<cpu_time_used << endl;

  start = clock();
  vector<int> b(SIZE_OF_VECTOR, 0);
  end = clock();
  cpu_time_used = ((double) (end-start));
  cout << "CPU time used by just making another vector = " <<cpu_time_used << endl;     
}



Last edited on
Error: Memset not defined in this scope. Need to add <cstring> to includes.
Also memset looks the faster on C::B / XP, like half the time the other functions take.
Last edited on
Actually half, or a lot less than half?
Ran this on an XP Pro (x86) workstation with C::B with these results:
1
2
3
4
5
6
7
CPU time used by memset = 15
CPU time used by assign = 63
CPU time used by for loop = 93
CPU time used by just making another vector = 63

Process returned 0 (0x0)   execution time : 0.390 s
Press any key to continue.
@Moschops: here's some variety for you

I had to #include <cstring>, divide by CLOCKS_PER_SEC and run each test 100 times to get readable results.

Output in seconds, cumulative over 100 repeats:

xlc on IBM: 0.69 0.70 0.68 0.85
alc on HP: 1.01 2.68 2.78 1.07
gcc on Linux: 3.69 2.43 2.46 4.93
clang++ on Linux: 2.03 1.99 2.07 4.16

I don't think memset is worth the hassle, in general.
Last edited on
@Moschops : Actually half, but didn't take a indeep-closeAllPrograms-OverclockPC test.
@Cubbi : Maybe you devided by CLOCKS_PER_SEC and you stored the result into an integer? Dunno, I get readable results.
Last edited on
@Cubbi

do you deal with IBM mainframe?
Last edited on
@vlad I deal with many different platforms. Do you consider P7-795's "mainframes"? Anyway, my dev boxes are just P5-595s though.

@EssGeEich by readable I mean comparable. clock_t is different on different platforms.
Last edited on
Cubbi, I meant z/OS and z/Series
@vlad Nope.
+1
CC on Sun: 4.36 7.79 30.9 8.29, but I already knew Sun C++ was't all that good
Last edited on
For completeness, gcc 4.0.2 on a Linux VM (and who knows what the hell's going on in there :p )

10
30
30
160
Once again, I'm getting completely different results than the rest of the world. :/

Single run tests revealed nothing (16ms precision wasn't sufficient to show a difference), so I just put a loop around it. 100 runs each:

memset: 1712
assign: 1702
for loop: 816
new vector: 3707

Somewhat consistent to a similar topic I made months ago, on the question of "cheapest way to copy a large array".

Could be some optimization taking place for the loop, but not for the rest?

(VC++2010 Express, Win7 64bit.)
@Gaminic I'm going to guess that your C++ library implements vector.assign as a call to memset() for suitable value_type (which is a very common implementation technique), and that you library memset() is too generic to make use of platform-specific optimizations, while your C++ compiler is set to optimize. (linux has the same problem, memset() and other general-purpose precompiled C library functions are slower than loops optimized to the target architecture)
Last edited on
Topic archived. No new replies allowed.