machine epsilon

Apr 3, 2011 at 12:36pm

Hi all,

I need high precision for my numerical calculations. As a test, I compute machine precision ("machine epsilon" (?)) for "float", "double" and "long double". (For thus purpose, I apply the C-program as you can find at the bottom this page). My results are:

1) "float" : Calculated Machine epsilon: 1.19209E-07
2) "double" : Calculated Machine epsilon: 1.0842E-19
3) "long double": Calculated Machine epsilon: -0

As you can see, I run into problems for "long double". How can I test the "long double" accuracy? (My numbers are typically E-25. So I need more precision than "double" can offer).

I am grateful for any help!

Regards, perr


This is the program I apply:


#include <stdio.h>

int main( int argc, char **argv )
{
//float machEps = 1.0f;
double machEps = 1.0f;


printf( "current Epsilon, 1 + current Epsilon\n" );
do {
printf( "%G\t%.20f\n", machEps, (1.0f + machEps) );
machEps /= 2.0f;
// If next epsilon yields 1, then break, because current
// epsilon is the machine epsilon.
}
//while ((float)(1.0 + (machEps/2.0)) != 1.0);
while ((double)(1.0 + (machEps/2.0)) != 1.0);

printf( "\nCalculated Machine epsilon: %G\n", machEps );
return 0;
}




Apr 3, 2011 at 5:04pm
If you need that much precision, you would be better off using a fixed point library.

Otherwise check sizeof( long double) against sizeof( double ). If they are the same, then the epsilon for long double is the same as for double.

But I'd still consider using a fixed point library instead. The problem with floating point is that as the value to the left of the decimal increases, you start losing precision to the right of the decimal.
Apr 3, 2011 at 7:53pm

Hi jsmith,

Thank you for your reply. I really appreciate it!

Here is my actual problem:
I am solving a differential equation numerically (the "Volterra integral equation"). I know for sure that this equation may be solved by the following iteration loop:

VecComplex c(N); // "VecComplex" is a dataype in "Numerical Recipes".
VecComplex K_0(N); // For the present discussion, it could have been any "double" vector datatype.

for (n=0;n<N;n++)
{
c[n] = c[0];

for (i=0;i<n;i++)
c[n] += c[i]*K[n-i];
}


Here "c"= solution I am looking for, and "K"= kernel. This kernel is strongly oscillating, (typically cos and sin functions). I have computed "K" with double precision.

For small times, i.e. small "n", the numerical solution is just fine. For large times, the solution is very small, typically c=E-20. The number of iteratations is typicall N=1.0E6.

I have a small C-program computing my machine precision. The output is: "Calculated Machine epsilon: 1.0842E-19", i.e. my solution "c" is smaller than the precision of my computer/compiler. Therefore, I don't trust my numerical solution for large times.

Now, my simple idea is to replace "double" (or "VecComplex") with "long double". Hoping that "long double" would give me a better machine precision, perhaps as good as E-30. In that case, I trust my numerical solution.

How can I solve my precision issue? Is "fixed point library" still an option? (I don't know what it is, I have to google it.)

Again, I appreciate any help!

Best regards, perr
Last edited on Apr 3, 2011 at 7:57pm
Apr 3, 2011 at 11:45pm
Yes, a fixed point library is always a good solution when precision is required and speed is not quite as important.
Apr 7, 2011 at 3:18pm

Googeling "fixed point library", I e.g. run into

http://www.mpfr.org/

The problem with this library and all other library that I find is that I can't make i compile. I try

g++ sample.c -I..mpfr-3.0.1 -I../gmp/include

but I only the errr message:


$ g++ sample.c -I..mpfr-3.0.1 -I../gmp/include
sample.c:33:18: error: mpfr.h: No such file or directory
sample.c: In function ‘int main()’:
sample.c:40: error: ‘mpfr_t’ was not declared in this scope
sample.c:40: error: expected `;' before ‘s’
sample.c:42: error: ‘t’ was not declared in this scope
sample.c:42: error: ‘mpfr_init2’ was not declared in this scope
sample.c:43: error: ‘GMP_RNDD’ was not declared in this scope
sample.c:43: error: ‘mpfr_set_d’ was not declared in this scope
sample.c:44: error: ‘s’ was not declared in this scope
sample.c:46: error: ‘u’ was not declared in this scope
sample.c:49: error: ‘GMP_RNDU’ was not declared in this scope
sample.c:49: error: ‘mpfr_mul_ui’ was not declared in this scope
sample.c:51: error: ‘mpfr_div’ was not declared in this scope
sample.c:52: error: ‘mpfr_add’ was not declared in this scope
sample.c:55: error: ‘mpfr_out_str’ was not declared in this scope
sample.c:57: error: ‘mpfr_clear’ was not declared in this scope


What have I done wrong?

I need _any_ library that can give me a datatype with E-30 precision?
I have a Windows 7, 64 bit PC. And i run Cygwin.

I appreciate any help!

Best, Per Kristian

Apr 7, 2011 at 4:05pm
There should be build instructions either available on the website or distributed along with the source files (probably the latter). Did you follow those?
Apr 7, 2011 at 6:27pm
The overall precision will very much depend on the absolute values of K. If they are large compared to the output, you are right, you may lose precision. It also very much depends, whether the numerical errors add up or not (and this depends on the problem).

long double would not give you really much more precision because most computers use 64 or 80 bits for their floating point processing. So 1e-20 is the best you can probably get without resorting to software methods.

Java has a builtin portable BigDecimal class which is very fast and can give you just any arbitrary precision (you want 1e-100, no problem). It runs out-of-the box.
Topic archived. No new replies allowed.