double vs long double machine accuracy

Greetings guys,

I have a question to the machine accuracy of long double.
First of all, using

 
  cout<< sizeof(long double);


I figured out that long double is implemented with 16 bytes in my system (in comparison to double which has 8 bytes).
That should mean I should get twice as much accuracy, i.e. significant decimal numbers than with double.

The machine accuracy eps which is the smallest positive number that, when added to 1, yields a different result than 1, i.e. "1 < 1+eps" is true.

Using double, I get "true" for eps=10^-15, but "false" for eps=10-^16.
Using long double, I get "true" for eps=10-19, but "false" for 10^-20.

How can this be? I would have assumed that I could go to eps^-30 and still receive true since I have twice as many significant digits.

edit: Wiki seems to support my results:
https://en.wikipedia.org/wiki/Machine_epsilon

Best,

PiF
Last edited on
> That should mean I should get twice as much accuracy, i.e. significant decimal numbers than with double.

Not necessarily. The number of bits used for the value representation of an object may be less than the number of bits used in its object representation.

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
http://eel.is/c++draft/basic.types#4



Run this (on your implementation) to see the number of significant decimal digits for floating point types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <limits>
#include <type_traits>

template < typename T >
typename std::enable_if< std::is_floating_point<T>::value, std::ostream& >::type dump()
{
    using limits = std::numeric_limits<T> ;
    return std::cout << "size: " << sizeof(T) << '\n'
                     << "digits in the mantissa: " << limits::digits << '\n'
                     << "precision (decimal digits): " << limits::digits10 << '\n' ;
}

#define DUMP(type) ( std::cout << '\n' << #type << "\n-------\n" && dump<type>() )

int main()
{
    DUMP(float) ;
    DUMP(double) ;
    DUMP(long double) ;
}


For example, on coliru:
Linux stacked-crooked 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
g++


float
-------
size: 4
digits in the mantissa: 24
precision (decimal digits): 6

double
-------
size: 8
digits in the mantissa: 53
precision (decimal digits): 15

long double
-------
size: 16
digits in the mantissa: 64
precision (decimal digits): 18

http://coliru.stacked-crooked.com/a/d259c98c0323727b
@PhysicsisFun,

To slightly expand on @JLBorges point, a lot depends on the CPU in use.

You are probably thinking of the IEE-754 quadruple precision floating point format, which does have 112 bits of mantissa (with 1 bit implied), and does offer the precision you seek.

Unfortunately, few processors support this in hardware. The Intel/AMD family is not among them. If you use floating point formats beyond the double, you're probably calling upon software emulation, which significantly impacts performance.

Where software emulation is performed, a lot then depends on the compiler. It is making the choice.

I have few details here, but it seems Microsoft's compilers don't support the version you're interested in.

Some compilers (perhaps GCC) may have types _Quad or __float128 (or something similar) which potentially support the precision you seek, but I have no research to confirm.

Alternative support might be obtained through C++ classes which use a "double/double" approach, basically combining doubles to provide the extended precision (may not be pretty).

There can be some confusion on Intel/AMD machines, because they do suggest support for a 128 bit floating point format, but that is typically for SIMD instructions which treat that as a structure of 4 floating point values, though a C++ library might use that to create the effect of a large precision version.

Prompted only by your name, I'm curious if your purpose is physics simulation.

if your compiler lets you get to whatever the largest thing the FPU uses internally, you will lose the FPU's defense against rounding problems. The extra precision is there to control the accumulation of errors. How big the FPU's internal type is varies a fair amount from chip to chip. Much older chips its mostly 80 bits.

If you are willing to go there, you can get more bits from the FPU, but you have to be very careful with it, and its still finite, may not reach 112 or whatever your target is.
Last edited on
Topic archived. No new replies allowed.