I am doing a practice on the IEEE 754 operations that has an inclusion of addition. My big target is to make a code that will run them, but I have to start by understanding their functionality. I have been doing research online, and he information that am finding is not helpful to me at all and this is due to the fact that I donâ€™t know what I should do or even where to start. Can someone share information on how I should handle this and also help me understand how I can get the sum of two given numbers as shown below:

Add N1 + N2

Sign Exponent Significand (Mantissa?)

_______________________________________

N1 = 0 | 11001111 | 110100000 ..... 0 |

______________________________________|

N2 = 1 | 11011101 | 101100000 ..... 0 |

______________________________________|

Now I know about prepending the decimal to make the numbers under the significant become:

1.110100000 ..... 0

1.101100000 ..... 0

and that the base2 of the exponents comes out to https://www.theengineeringprojects.com/2021/10/how-to-setup-c-environment.html

11001111 = 207

11011101 = 205

Subtracting 207 - 205 = 2,

Add N1 + N2

Sign Exponent Significand (Mantissa?)

_______________________________________

N1 = 0 | 11001111 | 110100000 ..... 0 |

______________________________________|

N2 = 1 | 11011101 | 101100000 ..... 0 |

______________________________________|

Now I know about prepending the decimal to make the numbers under the significant become:

1.110100000 ..... 0

1.101100000 ..... 0

and that the base2 of the exponents comes out to https://www.theengineeringprojects.com/2021/10/how-to-setup-c-environment.html

11001111 = 207

11011101 = 205

Subtracting 207 - 205 = 2,

Last edited on

the exponents are shifted by a constant depending on what format you have (float or double or other). For 32 bit it is 127, for 64 bit is 1023.

The key to doing this yourself with bits is to get your basic tools down first.

pick a representation (float or double) and represent that with a class using a std::bitset.

then do the basics... extract the sign field, the exponent (normalized) field, and the mantissa (corrected) field. Use those values to see if you can produce the number back again and print it.

Once you have that background you can work on addition:

consider: https://isaaccomputerscience.org/concepts/data_numbases_floating_point?examBoard=all&stage=all

The key to doing this yourself with bits is to get your basic tools down first.

pick a representation (float or double) and represent that with a class using a std::bitset.

then do the basics... extract the sign field, the exponent (normalized) field, and the mantissa (corrected) field. Use those values to see if you can produce the number back again and print it.

Once you have that background you can work on addition:

consider: https://isaaccomputerscience.org/concepts/data_numbases_floating_point?examBoard=all&stage=all

Last edited on

Topic archived. No new replies allowed.