Can someone explain to me how this bizar

Forum

Forum
General C++ Programming
Can someone explain to me how this bizar

Can someone explain to me how this bizarre piece of code works?

int parity(int x, int size)
{
	int i;
	int p = 0;
	x = (x & ((1<<size)-1));
	for (i=0; i<size; i++)
	{
		if (x & 0x1) p++;
		x = x >> 1;
	}
	return (0 == (p & 0x1));
}

This function is supposed to emulate the parity check of the 8080 processor. It looks very complex and I am confused with it.

Ganado (6811)

When faced with a function, and you're not sure what it's doing, try to break it into separate parts, and look at the data. This can be kinda fun if you have the right mindset about it.

First, you see a bunch of shifting (<<) and bitwise-ANDing (&).
So, something is being done at the bit level, 1s and 0s. Let's represent our data in binary with a helper function:

#include <iostream>
#include <string>
std::string binary_str(int n, int num_bits) 
{ 
	std::string output(num_bits, '0');
	for (int i = 0; i < num_bits; i++)
	{
		char bit = (n % 2) + '0';
		output[i] = bit;
		n = n / 2;
	}
	return std::string(output.rbegin(), output.rend()); // reversed
}

int main()
{
    int x = 0b1010;
    std::cout << binary_str(x, 4);
}

This will print 1010.

Let's look at just the first actual logical line: 5.

It deals with two variables, x and size.
Let's get rid of the unnecessary parentheses...
x = x & (1 << size) - 1;

(1 << size) is left-shifting the number 1 size places.
So (1 << 1) means 1 shifted left 1 place, which becomes binary 10.
(1 << 2) means 1 shifted left 2 places, which becomes binary 100.

#include <iostream>
#include <string>
int part_A(int size)
{
	return 1 << size;
}

std::string binary_str(int n, int num_bits) 
{ 
	std::string output(num_bits, '0');
	for (int i = 0; i < num_bits; i++)
	{
		char bit = (n % 2) + '0';
		output[i] = bit;
		n = n / 2;
	}
	return std::string(output.rbegin(), output.rend()); // reversed
}

int main()
{
	using std::string;
	
	int num_bits = 4;
	for (int x = 0; x < 16; x++)
	{
		// print [input] [output]
		string input = binary_str(x, 20);
		string output = binary_str(part_A(num_bits), 20);
		std::cout << input << " " << output << '\n';
	}
}

Effectively, this means that it's making the first size bits 0, and then 1 after that.
e.g. (1 << 5) ---> 0b100000.
std::cout << binary_str(1 << 5, 6) << '\n';

But then, you have the result - 1. If you're familiar with how binary numbers work, if you have a power of 2 (e.g. 1000) and subtract 1, you get one less than a power of 2, which will be all 1s.
So,
0b1000 - 1 == 0b111.
0b10000 - 1 == 0b1111.

In short, (1 << size) -1 is producing a number that size 1s in it.
Then, ANDing the original x with it makes it so that all binary digits beyond the first size digits become 0, so we don't have to worry about them.

In other words line 5 makes it so we can assume we're dealing with a number that is only size-bits wide. Any larger bits are discarded (become 0).

_________________________________________________________________

Now, we have the for loop. This part is easier to reason about.
(x & 0x1):
When you AND a number with 1, you're basically asking "is the least-significant digit 1"?
(binary) 10 would produce false, 11 would be true, 10101 would be true, 10010 would be false.

Then, it shifted right 1 place. This is the same thing as integer dividing by 2,
so 10010 becomes 1001, for example. This lets us look at the next bit to check its evenness.

So, the for loop is going through each digit, and checking if the digit is a 1, and if so, adding to a counter.
Or, simply put, it's counting the number of binary 1s in the number.

We can test this hypothesis:

int part_B(int x, int size)
{
	int p = 0;
	x = (x & ((1<<size)-1));
	for (int i=0; i<size; i++)
	{
		if (x & 0x1) p++;
		x = x >> 1;
	}
	return p;
}

#include <string>
std::string binary_str(int n, int num_bits) 
{ 
	std::string output(num_bits, '0');
	for (int i = 0; i < num_bits; i++)
	{
		char bit = (n % 2) + '0';
		output[i] = bit;
		n = n / 2;
	}
	return std::string(output.rbegin(), output.rend()); // reversed
}

#include <iostream>
int main()
{
	using std::string;
	
	int num_bits = 4;
	for (int x = 0; x < 16; x++)
	{
		// print [input] [output]
		string input = binary_str(x, num_bits);
		int output = part_B(x, num_bits);
		std::cout << input << " " << output << '\n';
	}
}

See? The output is the number of 1s in the input.

_________________________________________________________________

Finally, we take this result (p), and do (p & 1). This is the same thing as before, but in the context, it means (p & 1) means "is p odd".
This is compared to 0,
i.e. if there are an even amount of 1s in the original number, the function returns true.
If there is an odd amount of 1s in the original number, the function returns false.

Putting it all together...

bool parity(int x, int size)
{
	int i;
	int p = 0;
	x = (x & ((1<<size)-1));
	for (i=0; i<size; i++)
	{
		if (x & 0x1) p++;
		x = x >> 1;
	}
	return (0 == (p & 0x1));
}

#include <string>
std::string binary_str(int n, int num_bits) 
{ 
	std::string output(num_bits, '0');
	for (int i = 0; i < num_bits; i++)
	{
		char bit = (n % 2) + '0';
		output[i] = bit;
		n = n / 2;
	}
	return std::string(output.rbegin(), output.rend()); // reversed
}

#include <iostream>
int main()
{
	using std::string;
	
	int num_bits = 4;
	for (int x = 0; x < 16; x++)
	{
		// print [input] [output]
		string input = binary_str(x, num_bits);
		int output = parity(x, num_bits);
		std::cout << input << " " << output << '\n';
	}
}

It returns whether or not the input number has an even number of 1s.

______________________________________

Afterthoughts:

I don't know assembly very well, but you can play around with websites like godbolt.org to see what the C++ compiler is producing.
For example, your function creates the following assembly code on -O3 optimization

parity(int, int):
        mov     ecx, esi
        mov     eax, 1
        mov     r8d, edi
        sal     eax, cl
        lea     edi, [rax-1]
        and     edi, r8d
        test    esi, esi
        jle     .L5
        xor     eax, eax
        xor     edx, edx
.L4:
        mov     r8d, edi
        and     r8d, 1
        cmp     r8d, 1
        sbb     eax, -1
        add     edx, 1
        sar     edi
        cmp     ecx, edx
        jne     .L4
        not     eax
        and     eax, 1
        ret
.L5:
        mov     eax, 1
        ret

If we look at the pattern I posted in the output above the assembly, you might notice that the output is, sequentially, is 1001 0110 0110 1001. You can kinda see a recursive, fractally thing going on.
But let's search for 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1 on OEIS.org
You'll find that this is the Thue-Morse sequence, described as,
"a(n) = "parity sequence" = parity of number of 1's in binary representation of n.", or
"a(n) = S2(n) mod 2, where S2(n) = sum of digits of n, n in base-2 notation. "
If you search for this, you can find some interesting fractals, like the Koch snowflake.

I guess for whatever reasons, the compiler isn't smart enough to pick on some optimizations. We can help the compiler out a bit by removing the if-branch.

	for (i=0; i<size; i++)
	{
		p += (x & 0x1);
		x = x >> 1;
	}

In this case, this only eliminates, I think, one or two instructions? So not much difference, but it is interesting to note that even with today's advanced compilers, there are places where if you try hard to not use if-else statements, the compiler can optimize more.

But if we go back to the OEIS description, note that the second description is "sum of digits, mod 2". In mod 2, addition is equivalent to xor. So... we should be able to change our for loop.

int parity_v3(int x, int size)
{
	x = x & (1 << size) - 1;
	int ret = 1;
	for (int i = 0; i < size; i++)
	{
		ret = ret ^ (x & 0x1);
		x = x >> 1;
	}
	return ret;
}

I believe this produces the same result. I don't think its any faster than the above method (at least, not in any significant way), but I just thought it was interesting.

Last edited on

jguo5258 (7)

@Ganado Thank you! Your in-depth explanation really helped me understand the function well. I implemented my own parity function after reading your post.

jonnin (11437)

this is very similar to how CRC functions work too.

In this case, this only eliminates, I think, one or two instructions? So not much difference, but it is interesting to note that even with today's advanced compilers, there are places where if you try hard to not use if-else statements, the compiler can optimize more.

The thing about if/else is trying to toy with branch prediction. The CPU can try to predict which branch code takes and arrange the pipelines around that to improve performance. Trouble is that some branches are random as far as the cpu can tell, and it can't predict them, and a second problem is that a complicated block of code may have many branches and it can't predict all of them well, it doesn't have that many sets of statistics and alt branch pipes and so on to do it. Elimination of if statements is helpful in both these cases, and in other cases as well. Its not wholly true, but in my experience, if you want speed, branch elimination is a big deal: the compiler/cpu MAY get it right if you leave a branch in but it CAN'T mess it up if you took it out.

I often take them out, and mostly use one of 2 techniques:
1) just do the extra work. a branch to avoid doing work is often more harmful than doing the extra work, if the extra work is smallish. eg if(a != 0) then b = a*c else b = 0 is slower than just saying a=b*c .. the cpu is very good at multiplication and dicey at branching.
2) a lookup table. tbl[] = {value if false, value if true} … instead of if (something) value if true ele value if false, just say value = tbl[something] and let the Boolean expression collapse to 0/1 as an index into the table.

technique #2 is often a little hard to read/follow and should only be used, with a good comment, when you can show that its notably faster using the approach AND that the speed increase is necessary for this program.

Last edited on

Ganado (6811)

Good points. With (2), it can also be hard, because sometimes retrieving the lookup table values can cause a cache miss (200 cycles), when doing the extra processing wouldn't have. I remember seeing a talk where simply computing sqrt again only cost ~10-20 cycles, but retrieving the in-memory value took 200 cycles (10x slower), so +1 to the "when you can show that it's notably faster" (depends on what exactly you're putting in the lookup table).

Last edited on

jonnin (11437)

right! Ive wondered off and on what a cpu might be like if we took half the cores off and turned that real estate into cache, and how many bytes would fit. Would be a special purpose cpu; most systems need the cores with modern thread happy code.

Topic archived. No new replies allowed.