float vs double

Apr 24, 2017 at 12:28pm

Hi there

I am reading fairly old book Object Orientated Programming in C++. What I have noticed is that they use frequently float type, and when I read other (newer books such as Programming Principles and Practise using C++ 2nd Edition)and here on forums, articles. I get impression (maybe wrongly)that currently double is proffered type for floating point numbers.

is that correct?

thanks :)

Apr 24, 2017 at 1:45pm

TheIdeasMan (6847)

Yes, double is preferred.

double is the default for literal values. The problem with float is that it's precision is easily exceeded. float can do 6 or 7significant figures (sf), while double can do 15 or 16sf, long double 18 or 19sf, all of those depend of the implementation - the system you are on.

One might use float if a library requires it - typically a graphics library, or if you have billions of them and doing float is worthwhile.

Last edited on Apr 24, 2017 at 1:45pm

Apr 24, 2017 at 5:24pm

xxvms (197)

Thank you TheIdeasMan :)

what do you mean by this ?
if you have billions of them and doing float is worthwhile.

I am not sure i understand that part (billions of libraries or operations? )

Apr 24, 2017 at 6:17pm

jonnin (11491)

floats are smaller in terms of bytes, so if you had to send billions of them over a network or store them in a file, and IF you only need a small number of significant digits, you would use float because it would reduce the bandwidth or file size.

So its "billions of values" for example

vector<double> v;
for(i = 0; i< onebillion; i++)
v.push_back(i);

Apr 24, 2017 at 8:16pm

dhayden (5799)

I don't know if it's still true, but once upon a time, the runtime required to do math with a double was no different on X86 systems than with a float, and maybe even faster. The reason was that the hardware always operated on doubles, so using a float might actually slow your program down because it had to convert to/from double.

Apr 24, 2017 at 8:39pm

Oriol Serrabassa (100)

As far as I am aware:

Double (64 bit)

Pros
More maneuverability
"Faster" than floats (in CPU)
More precision and range

Cons
Uses lots of memory (HUGE CONS)

Float (32 bit)

Pros
Uses less memory than double.
In some situations they are better option than double (Google them)

Cons
Less precision (not important if you aren't working with high precision numbers)
Less range
"Slower" than double (in CPU)

My suggestion: Stick to doubles but remember sometimes float > double

Last edited on Apr 25, 2017 at 3:29pm

Apr 25, 2017 at 3:54am

TheIdeasMan (6847)

@Oriol Serrabassa

float - 32 bit
double 64 bit
long double 80 bit

http://en.cppreference.com/w/cpp/language/types

Less precision (not important if you aren't working with high precision numbers)

This is how easy it is to loose precision: Take a number like 9.999 and square it. float has already lost precision, it can only do 7sf at best (sometimes only 6), that calculation requires 8sf (99.980001). The problem gets much worse with higher powers , eg 9.99³

There are whole range of problems with float, trigonometry for example.

So be careful what you mean by "high precision numbers" :+)

Float might also be slightly slower on a 64 bit system because the compiler might pack 2 of them into a 64 bit machine word, it takes time to unpack them.

Apr 25, 2017 at 12:11pm

xxvms (197)

and one assumed that we live in simple word, I think I prefer integers ;)

Apr 25, 2017 at 3:28pm

Oriol Serrabassa (100)

@TheIdeasMan you actually made me remember half type exists. What I mean with not working in high precision is this 99.980001 -> 99.98

Apr 25, 2017 at 5:43pm

jonnin (11491)

The performance should be nearly the same for all types on intel and similar systems. It may vary a bit on exotic hardware and I don't know what RISC machines do.

I think for all standard c++ types, the FPU promotes the value to a 10 byte floating point, which it uses to do the computation, and converts back. The only real difference is the time it costs to fetch the data from the cache to the CPU, which is wide enough on modern machines that I think it will be identical for all types (the smaller types just have unused wires on the bus, as I understand it you can't optimize it to send 2 floats at once over the wires, its serial, ?? I could be wrong...) so in all cases its doing more or less the same amount of work.

Some compilers support using the 10 byte directly, which is nice but it can give more numerical issues if used carelessly. I don't know if the language has a standard 10b type now or not (??).

Topic archived. No new replies allowed.