Which one is faster?

Pages: 123
closed account (EzwRko23)
1.
1
2
3
4
void zeroArray(int* array, int n) {
  for (int i = 0; i < n; ++i)
    array[i] = 0;
}


2.
1
2
3
4
5
void zeroArray(int* array, int n) {
  int* end = array + n; 
  while (array < end)
    *(array++) = 0;
}


?

Why?


// Please do not post benchmark results - I want to know your expectations on these two styles of code - which one do you prefer; I could to such benchmark myself, if I wanted to know the scientific truth.
Last edited on
More flame bait? From your previous posts, I don't think you have any interest in an answer.
closed account (EzwRko23)
I swear I won't use word "Scala" in this thread (except this one particular post). :P
Last edited on
@kbw Please don't reply based on prejudices

I wouldn't expect any significant difference between the two pieces of code
Last edited on
@kbw Please don't reply based on prejudices
Let the flame begin.
More flame bait? From your previous posts, I don't think you have any interest in an answer.
I have interest as to what the ans is. but why is it a flamebait?
closed account (EzwRko23)
kbw didn't like my previous posts. It is like in politics. Just because a law was proposed by the opposition party, the party in control votes against it, regardless of the merits.

Anway: @unregistered, what is your intuition about this code?
Last edited on
I guess both versions may be equally fast in real life, but...

...the second one has (theoretically) one less instruction. Instead of accessing i-th element of an array each time (i.e. array[i] = 0) which requires addition, it just uses the pointer to the current element (which doesn't need offsetting to assign i-th element).
closed account (z05DSL3A)
My initial reaction would be that the first one (for loop) would be optimized better by the compiler than the second (but it probably depends heavily on the compiler and settings).
closed account (EzwRko23)
@GreyWolf: do you have any particular optimization on your mind?
What about:
1
2
3
4
void zeroArray(int* array, int n) {
    while (n)
      array[--n] = 0;
}

But maybe the for version gets optimized into something like this?
Hmm, isn't a
memset(array, 0, sizeof(int)*arraySize)
the best solution?
closed account (1yR4jE8b)
Like Wolf said, the second one doesn't offset the array every time so it is technically faster...but the difference here is soooooooooooo minimal anyway it'd be stupid to say it's 'omg micro-optimized code'. I think a good optimizing compiler would be able to optimize both into the same assembly either way.

And R0mai, your solution won't work...you run past the lower band of the array and zero out all memory in the program below every address at (array + n).

whoops, early in the morning here and I'm totally not a morning person...totally didn't notice you weren't using an int* for the loop condition.
Last edited on
And R0mai, your solution won't work...you run past the lower band of the array and zero out all memory in the program below every address at (array + n).

Are you sure about that? Because I'm not.

Assume n is 3.
1
2
3
4
5
6
7
8
9
10
11
12
13
while (3) : 3!=0 {
     --3; (n=2)
     array[2] = 0;
}
while (2) : 2!=0 {
     --2; (n=1)
     array[1] = 0;
}
while (1) : 1!=0 {
     --1; (n=0)
     array[0] = 0;
}
while (0) : 0!=0 : exit loop


EDIT: of course if n is negative it won't work, but I guess we can assume that the size of an array is always >=0. (using unsigned int instead of int would be even better)
Last edited on
closed account (EzwRko23)
memset is the best solution only if you want to initialize this array with zeroes; but forget about memset now. Sometimes you wish to fill the array with something different than zero, but still simple enough so that the looping might be the main cost.

R0mai's code is correct; loop will terminate when n reaches 0.
Last edited on
closed account (z05DSL3A)
do you have any particular optimization on your mind?

No, not really; just that the in past (when I have bothered looking at such things) for loops have tended to be optimized better than while loops.

When compiler optimization was not very good I used to do the loop down to zero style of code.
I would prefer a more generic, reusable solution such as: std::fill_n( array, n, 0 );

"Correctness, simplicity, and clarity come first"
--Item 6 in C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Sutter and Alexandrescu.

Let the standard library implementers decide how to implement the operation.
Last edited on
closed account (z05DSL3A)
"Correctness, simplicity, and clarity come first" +1
"Correctness, simplicity, and clarity come first" +1
-1

Best not to generalize (except to say that, in general, best not to generalize :P); in the right situations, speed is paramount.

Re the original question, I think #2 would be faster - array[i] has to be calculated as (array + i*sizeof(int)), which is an additional instruction per iteration. You can optimize #1 (depending on your system) with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void zeroArray(int* array, int n) {
  int iterations = (n % 4) * 4;
  for (int i = 0; i < iterations ; i += 4)
  {
     array[i] = 0;
     array[i+1] = 0;
     array[i+2] = 0;
     array[i+3] = 0;
  }

  for (int i = iterations; i < n; ++i)
  {
     array[i] = 0;
  }  
}


I believe #2 can be optimized in a similar way.

--Rollie
If the compiler doesn't optimize both of them to the exact same thing I'd be a little surprised.

#1 has modification, but no indexing
#2 has indexing, but no modification

In my experience with extremely outdated assembly languages that aren't used anymore, indexing is faster, so I'd put my money on #2.


But in reality, it doesn't matter. Let the compiler optimize the way it will.

"Correctness, simplicity, and clarity come first" +1

+1 here as well

-1

Best not to generalize (except to say that, in general, best not to generalize :P); in the right situations, speed is paramount.


If you're concerned with speed so much that you're asking quesitons like the OP, then you shouldn't be writing this in C++. You'd be better off inlining asm to ensure it gets optimized the way you want. Otherwise, no matter how hard you try, you're still at the mercy of the compiler.
Pages: 123