Which one is faster?

Pages: 123
Aug 31, 2010 at 7:56am
closed account (EzwRko23)
1.
1
2
3
4
void zeroArray(int* array, int n) {
  for (int i = 0; i < n; ++i)
    array[i] = 0;
}


2.
1
2
3
4
5
void zeroArray(int* array, int n) {
  int* end = array + n; 
  while (array < end)
    *(array++) = 0;
}


?

Why?


// Please do not post benchmark results - I want to know your expectations on these two styles of code - which one do you prefer; I could to such benchmark myself, if I wanted to know the scientific truth.
Last edited on Aug 31, 2010 at 8:05am
Aug 31, 2010 at 8:25am
More flame bait? From your previous posts, I don't think you have any interest in an answer.
Aug 31, 2010 at 8:29am
closed account (EzwRko23)
I swear I won't use word "Scala" in this thread (except this one particular post). :P
Last edited on Aug 31, 2010 at 8:30am
Aug 31, 2010 at 8:30am
@kbw Please don't reply based on prejudices

I wouldn't expect any significant difference between the two pieces of code
Last edited on Aug 31, 2010 at 8:30am
Aug 31, 2010 at 8:39am
@kbw Please don't reply based on prejudices
Let the flame begin.
Aug 31, 2010 at 8:43am
More flame bait? From your previous posts, I don't think you have any interest in an answer.
I have interest as to what the ans is. but why is it a flamebait?
Aug 31, 2010 at 8:53am
closed account (EzwRko23)
kbw didn't like my previous posts. It is like in politics. Just because a law was proposed by the opposition party, the party in control votes against it, regardless of the merits.

Anway: @unregistered, what is your intuition about this code?
Last edited on Aug 31, 2010 at 8:54am
Aug 31, 2010 at 9:03am
I guess both versions may be equally fast in real life, but...

...the second one has (theoretically) one less instruction. Instead of accessing i-th element of an array each time (i.e. array[i] = 0) which requires addition, it just uses the pointer to the current element (which doesn't need offsetting to assign i-th element).
Aug 31, 2010 at 9:27am
closed account (z05DSL3A)
My initial reaction would be that the first one (for loop) would be optimized better by the compiler than the second (but it probably depends heavily on the compiler and settings).
Aug 31, 2010 at 10:11am
closed account (EzwRko23)
@GreyWolf: do you have any particular optimization on your mind?
Aug 31, 2010 at 10:58am
What about:
1
2
3
4
void zeroArray(int* array, int n) {
    while (n)
      array[--n] = 0;
}

But maybe the for version gets optimized into something like this?
Aug 31, 2010 at 11:20am
Hmm, isn't a
memset(array, 0, sizeof(int)*arraySize)
the best solution?
Aug 31, 2010 at 11:50am
closed account (1yR4jE8b)
Like Wolf said, the second one doesn't offset the array every time so it is technically faster...but the difference here is soooooooooooo minimal anyway it'd be stupid to say it's 'omg micro-optimized code'. I think a good optimizing compiler would be able to optimize both into the same assembly either way.

And R0mai, your solution won't work...you run past the lower band of the array and zero out all memory in the program below every address at (array + n).

whoops, early in the morning here and I'm totally not a morning person...totally didn't notice you weren't using an int* for the loop condition.
Last edited on Aug 31, 2010 at 12:15pm
Aug 31, 2010 at 12:06pm
And R0mai, your solution won't work...you run past the lower band of the array and zero out all memory in the program below every address at (array + n).

Are you sure about that? Because I'm not.

Assume n is 3.
1
2
3
4
5
6
7
8
9
10
11
12
13
while (3) : 3!=0 {
     --3; (n=2)
     array[2] = 0;
}
while (2) : 2!=0 {
     --2; (n=1)
     array[1] = 0;
}
while (1) : 1!=0 {
     --1; (n=0)
     array[0] = 0;
}
while (0) : 0!=0 : exit loop


EDIT: of course if n is negative it won't work, but I guess we can assume that the size of an array is always >=0. (using unsigned int instead of int would be even better)
Last edited on Aug 31, 2010 at 12:09pm
Aug 31, 2010 at 12:12pm
closed account (EzwRko23)
memset is the best solution only if you want to initialize this array with zeroes; but forget about memset now. Sometimes you wish to fill the array with something different than zero, but still simple enough so that the looping might be the main cost.

R0mai's code is correct; loop will terminate when n reaches 0.
Last edited on Aug 31, 2010 at 12:13pm
Aug 31, 2010 at 12:36pm
closed account (z05DSL3A)
do you have any particular optimization on your mind?

No, not really; just that the in past (when I have bothered looking at such things) for loops have tended to be optimized better than while loops.

When compiler optimization was not very good I used to do the loop down to zero style of code.
Aug 31, 2010 at 2:03pm
I would prefer a more generic, reusable solution such as: std::fill_n( array, n, 0 );

"Correctness, simplicity, and clarity come first"
--Item 6 in C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Sutter and Alexandrescu.

Let the standard library implementers decide how to implement the operation.
Last edited on Aug 31, 2010 at 2:05pm
Aug 31, 2010 at 3:10pm
closed account (z05DSL3A)
"Correctness, simplicity, and clarity come first" +1
Aug 31, 2010 at 6:13pm
"Correctness, simplicity, and clarity come first" +1
-1

Best not to generalize (except to say that, in general, best not to generalize :P); in the right situations, speed is paramount.

Re the original question, I think #2 would be faster - array[i] has to be calculated as (array + i*sizeof(int)), which is an additional instruction per iteration. You can optimize #1 (depending on your system) with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void zeroArray(int* array, int n) {
  int iterations = (n % 4) * 4;
  for (int i = 0; i < iterations ; i += 4)
  {
     array[i] = 0;
     array[i+1] = 0;
     array[i+2] = 0;
     array[i+3] = 0;
  }

  for (int i = iterations; i < n; ++i)
  {
     array[i] = 0;
  }  
}


I believe #2 can be optimized in a similar way.

--Rollie
Aug 31, 2010 at 6:34pm
If the compiler doesn't optimize both of them to the exact same thing I'd be a little surprised.

#1 has modification, but no indexing
#2 has indexing, but no modification

In my experience with extremely outdated assembly languages that aren't used anymore, indexing is faster, so I'd put my money on #2.


But in reality, it doesn't matter. Let the compiler optimize the way it will.

"Correctness, simplicity, and clarity come first" +1

+1 here as well

-1

Best not to generalize (except to say that, in general, best not to generalize :P); in the right situations, speed is paramount.


If you're concerned with speed so much that you're asking quesitons like the OP, then you shouldn't be writing this in C++. You'd be better off inlining asm to ensure it gets optimized the way you want. Otherwise, no matter how hard you try, you're still at the mercy of the compiler.
Pages: 123