Forget about
long double. The program that we're talking uses only
double.
https://cplusplus.com/forum/general/182508/#msg894208
My assumption is that
double uses the "double-precision floating-point" format (binary64) as specified by IEEE 754.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
I'm pretty sure that is true also on your computer.
A total of 64 bits is used.
1 bit is used for the sign.
11 bits are used for the exponent.
52 bits are used for the fraction.
In my previous answer I used this tool
http://weitz.de/ieee/ to get the binary representation of the mantissa. The binary representation of the sign and exponent is irrelevant for this particular discussion because we are not stretching the limits of those. That's why I only wrote the mantissa as binary and left the base and exponent as decimal (base 10).
Note that the 1 bit before the dot is implicit and is not part of the 52 bits. That's why I wrote 52 bits are used for the
fraction (i.e. the fractional part; what's to the right of the dot) and not for the whole mantissa.
implicit
↓
590320000000000000000 = 1.0000000000000010101011111000001011001110000001001100 × 269
10000000000000000 = 1.0001110000110111100100110111111000001000000000000000 × 253
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52 bits |
Note that the exponents of these two numbers are not the same so before we can add them by hand we would have to rewrite them with the same exponent.
590320000000000000000 = 1.00000000000000101010111110000010110011100000010011000 × 269
10000000000000000 = 0.00000000000000010001110000110111100100110111111000001 × 269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53 bits |
Note that to not discard any information I had to use one extra binary digit. I also chose to write them with the same number of digits to simplify the calculation but that isn't strictly necessary.
Now you can just do normal addition of the mantissas by hand if you want.
https://en.wikipedia.org/wiki/Carry_(arithmetic) <--- pay attention to the fact that we're using binary here!
The result is:
590330000000000000000 = 1.00000000000000111100101110111010011000011000001011001 × 269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53 bits |
And after removing the last bit to get back to 52 bits for the fraction we get:
590329999999999934464 = 1.0000000000000011110010111011101001100001100000101100 × 269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52 bits |
I have to admit that I only did the addition of the mantissas by hand to verify, but I didn't actually calculate the whole expression on the right to get the number on the left. Instead I just entered the value that I got from the program into the tool that I linked earlier to see that it matched.
I don't know how it's actually being done in the hardware but the values I get in the program is consistent with the calculations here.