fastest way to XOR 2 arrays of boolean?

Jan 29, 2009 at 7:36pm
Hi,

I have two arrays of booleans, both have the same known size.
I would like to perform a XOR on the arrays and store the result in a third boolean array c.
I could iterate on the arrays and do a XOR on each value, but I was wondering if there was a faster way to do it, since it is a very "low level" operation?

Bottom line : whats the fastest way to XOR two boolean arrays?

Thanks,
Cyp

1
2
3
4
5
6
7
#what i am doing for now:
bool a[100];
bool b[100];
bool c[100];
for(int i=0; i<100; i++){
c[i]=a[i]^b[i];
}
Jan 29, 2009 at 7:39pm
I imagine the compiler would optimise it for you. Just compile with optimisations turn on :) That should also enable look-unrolling which would make it even faster.
Jan 29, 2009 at 7:48pm
I think the fastest way would be to make the size of the array a multiple of sizeof(long):
1
2
3
4
5
6
7
8
9
//Assume sizeof(long)==4
bool a[100];
bool b[100];
bool c[100];
long *a2=(long *)a,
	*b2=(long *)b,
	*c2=(long *)c;
for (int i=0;i<100/sizeof(long);i++)
	c2[i]=a2[i]^b2[i];
Jan 29, 2009 at 7:54pm
@Helios: Not ideal to use sizeof(type), as this can limit you to various compilers and architectures. What if sizeof(long) == 8? (100%8) != 0
http://developers.sun.com/solaris/articles/ILP32toLP64Issues.html

He could be trying to compile this on some strange hardware (SPARC, MIPS, RISC).

As I said. The compiler will likely optimise. If you wanted to make it easier for the compiler, unroll you loops to 5->10 instructions so it can execute concurrently.
Last edited on Jan 29, 2009 at 7:57pm
Jan 29, 2009 at 8:20pm
Well, yeah. That's why my comment states that the code assumes sizeof(long)==4. It was just an example. I took advantage of the fact that !(100%4) and that sizeof(long)==4 is very common.
An interesting fact: the loop will never produce a buffer overflow, so if (n%sizeof(long)), it's always possible to complete the last steps with the bool pointer.

Another solution would be to use int32_t, although that's not quite standard, sadly.
Jan 29, 2009 at 8:25pm
Assumption is the mother of all f*ck ups
Jan 29, 2009 at 8:47pm
Out of curiosity, I did a little benchmark with MinGW:

(unoptimized)
n=1024*1024*100
Initialization (bool elements): 3390 ms
XOR (uint32_t elements): 234 ms (best), 250 ms (worst)
XOR (bool elements): 656 ms (best)

(-O3 -fexpensive-optimizations)
n=1024*1024*100
Initialization (bool elements): 3296 ms
XOR (uint32_t elements): 188 ms (best), 235 ms (worst)
XOR (bool elements): 234 ms (best), 250 ms (worst)

I wonder why XOR with uint32_t elements varies so much.
Jan 29, 2009 at 8:50pm
Un-Roll the loops and try it unoptimized. Un-roll to 10 each
Jan 29, 2009 at 9:05pm
I believe that C defines "int" to be the word size of the architecture, so perhaps helios' solution with "int" instead of "long" accommodates Zaita's point.

For that last ounce of speed, I'd also try to write the loop such that the terminating condition was i == 0, since comparison to zero can be slightly faster than comparing to nonzero, and I'd also use pre-increment/pre-decrement instead of post-.
Jan 29, 2009 at 9:17pm
Unrolled to 10, 100 MB:
Unoptimized:
uint32_t: 200-220 ms
bool: 500 ms

Optimized:
uint32_t: 200-220 ms
bool: 180-200 ms (at one point it measured 281 ms. Probably the OS having a brain fart)
Last edited on Jan 29, 2009 at 9:17pm
Jan 29, 2009 at 9:18pm
Unrolling wins :D
Jan 30, 2009 at 2:35am
Assumption is the mother of all f*ck ups


Haha that made me laugh and brought back some old memories. During my first year of programming my Tutor told this to the classroom everyday haha.
Last edited on Jan 30, 2009 at 2:36am
Topic archived. No new replies allowed.