Question about floating point variables

Pages: 12
Jan 17, 2010 at 7:23pm
I'm new to C++ and had a problem with floating point variables.

The Code:
1
2
3
4
5
6
7
8

int main()
{
    double var1 = 0, var2 = 0;

    var1 = 45766.1112f;
    var2 = 1207890.11156f;


These numbers that I picked were completely random. Its just a small program built to test a template. My problem comes in at the variable assignment. After assignment the variable values are:

var1 = 45766.109375
var2 = 1207890.125

I'm curious why this happened so I know if there is something I need to be looking out for in the future. I've never had this problem before. But I'm also using a new IDE (Code::Blocks). Wasn't sure if it was a compiler issue or something that happens when dealing with floating points. If its the latter then why does this happen?

Thanks in advance
Jan 17, 2010 at 7:26pm
I'm going to disclaim by saying that I'm no compiler mechanics expert and that I'm not sure how C++ handles precision. But I'd bet that it's because the precision of double is insufficient to handle those two numbers. They both have about ten digits so that may be the cause. However I honestly cannot say that I know what the answer is.
Jan 17, 2010 at 7:40pm
Thank you for the reply tummychow.

I didn't expect for that to be the problem because everywhere that I've read about a signed double it says that it can handle up to 15 decimal digits. that's the reason I thought it might be a compiler issue. But I might not be understanding it's definition correctly.
Jan 17, 2010 at 7:43pm
That's what I meant when I said that I wasn't sure about the precision. I would bet that I'm wrong. I don't recognize that f at the end; I haven't used that syntax before, but if it's a signifier that the constant is a float it may be lost data in the process of float>double casts.
Jan 17, 2010 at 7:45pm
doubles ought to be large enough.
floats, however, are not.

You need at least 29 bits of mantissa to represent that smaller number (456766.1112) and float only has 23. Hence the rounding error.

And yes... you're using double, but you're casting to float first.

The trailing 'f' means the number is a float. If you leave off the 'f' you have a double. Try removing the 'f':

1
2
    var1 = 45766.1112;   // get rid of the f's
    var2 = 1207890.11156;



EDIT: doh I'm too slow. I just HAD to do all the math to determine the necessary mantissa XD
Last edited on Jan 17, 2010 at 7:47pm
Jan 17, 2010 at 7:47pm
There you go. See, I was onto something!
In general, the compiler's type choice for constants is good enough. (int for most ints, long for oversized integers, double for floating points, char[] for strings) As Disch noted, that's the cause for your rounding error.
Jan 17, 2010 at 8:21pm
Type left unspecified, the compiler defaults to int for integer constants and double for
floating point constants.


Jan 17, 2010 at 8:21pm
You guys were right. After I took the f's out it came out to be:

var1 = 45766.111199999999
var2 = 1207890.1115600001

So adding the f at the end of the number is like static_cast<float>. I thought it was just specifying that it was a floating point value.

One last question. The console windows is printing:

First Variable: 45766.1
Second Variable: 1.20789e+06

Is there a way to change it's output so that it prints the entire content of the variable?

Thanks for you help guys
Last edited on Jan 17, 2010 at 8:29pm
Jan 17, 2010 at 8:31pm
I believe that, in <iomanip> you can call setprecision() to control how many decimals get printed. I don't personally remember how to use it however.
Jan 17, 2010 at 8:41pm
The entire contents of what variable? "First variable" or "Second variable?"
setprecision() should work.

I'd also like to add that floating point arithmetic sucks, and you should avoid it. Why do you need it, anyway?
Jan 17, 2010 at 9:01pm
<iomanip> does work. I used the floatformat fixed and that gave me the result I was looking for.


I'd also like to add that floating point arithmetic sucks, and you should avoid it. Why do you need it, anyway?


Like i said I am new to C++, and it was a test application. The results (in this case) don't really matter. I just didn't get what I expected to. This is one of those problems that cost someone a lot of time later on down the road. I was just looking to fully understand what was happening now so I'll know what to do later.

Once again, Thanks for all your help guys
Jan 17, 2010 at 9:08pm
I'm not saying anything against you, just making a suggestion. I don't like it.

I just didn't get what I expected to.

This is why. You rarerly get what you expect to with floating point arithmetic because it's inaccurate. I suggest you avoid it and stick to integers.

Floating point is also slow and requires more memory (because they can't be 4 byte (32 bit) aligned on an x86).
The floating-point format needs slightly more storage (to encode the position of the radix point), so when stored in the same space, floating-point numbers achieve their greater range at the expense of precision.

http://en.wikipedia.org/wiki/Floating_point

It's slow because... well... think how many numbers there are between 1 and 2. 1.1, 1.11, 1.111, ad infinitum.

In short: there is an infinite amount of floating point numbers between each integer number.

Having said that, floating point does have it's uses; mostly in graphical (which is why GPUs are faster at floating point than CPUs) and scientific applications (which is why programs like folding@home are CUDA-enabled ( http://en.wikipedia.org/wiki/CUDA )).
Last edited on Jan 17, 2010 at 9:10pm
Jan 17, 2010 at 9:15pm
chrisname wrote:
I'd also like to add that floating point arithmetic sucks, and you should avoid it. Why do you need it, anyway?
What?

That certainly explains why, after 50 years, computer manufacturers and programming language designers are still using it profitably.


There are caveats that you should be aware of before using it.

What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Remember, FP is not designed for exactness.
Jan 17, 2010 at 9:15pm
I edited my post to say what it's good for, too. I just really don't like it.
Jan 17, 2010 at 9:21pm
Inaccuracy is inevitable with any FP variable. It's simply a matter of whether or not that inaccuracy is mitigable enough to be worth the increased accuracy over integer variables.
Jan 17, 2010 at 9:24pm
Inaccuracy is inevitable with any FP variable.

That's why I don't like it.
Jan 17, 2010 at 10:32pm
It's slow because... well... think how many numbers there are between 1 and 2. 1.1, 1.11, 1.111, ad infinitum.

In short: there is an infinite amount of floating point numbers between each integer number.
Waaa...? Are you suggesting floating point numbers encompass the entire Real set?

And no, that's not why floating point numbers are slower.
Jan 18, 2010 at 8:30am
Are you suggesting floating point numbers encompass the entire Real set?

Sure :l

And no, that's not why floating point numbers are slower.

Why are they slower then? That was the best reason I could think of on the fly...
Jan 18, 2010 at 2:02pm
It is slower because it takes a lot more hardware and bit-flipping to process FP values than it does to process integer values.

Read the post I linked for you all.
Jan 18, 2010 at 4:37pm
Inaccuracy is inevitable with any FP variable.

That's why I don't like it.



This is going to sound like the glass half-full vs half-empty expression, but it really is more than that.

You should not look at floating point numbers as being inaccurate but rather look at them as simply being 'accurate enough'.

For the purposes FP types are meant to serve, that is the requirement they are meant to meet.

This is why. You rarerly get what you expect to with floating point arithmetic because it's inaccurate.


If you understand both how FP types work, and how they should be used, there is absolutely no reason you should not always get what you expect. Floats and doubles are not some mystical magical data types that defy reality. They only appear that way to the people that expect them to be nothing more than decimal versions of ints.

In short: there is an infinite amount of floating point numbers between each integer number.


And that there represents a fundamental misunderstanding of floating point numbers.

There are actually a finite amount of values any floating point type can represent. To be exact, 4,294,967,296 possible finite values for a C++ float.

What you probably meant to say was there is an infinite amount of real numbers between each integer number.

I'm not nitpicking here. There is more than just a semantic difference between those two statements, and that difference is in how floating point data types represent real numbers, chiefly the inherent issues with storing real numbers in what is ultimately a finite binary format.
Pages: 12