All are defined as double yet the results for mo[0], mo[3] and p[0] are not accurate. I know that double is twice as precise as float but should I be using float to maintain accuracy? Or am I missing something in my code?
compute your constants once. lines 3 and 4 may give you roundoff that isnt necessary if you had hard coded the true result one time. Use a high precision external tool to get the correct value here.
change it to simply:
dr = x* .005... blah blah digits, it may help a little, or it may be same as before.
@jib; yes I know about approximations. :)
I teach computer science to degree level and focus on binary structures and machine code; along with programming (just not c++). Dependent upon the way the data is stored you can increase range but decrease precision or vice versa. I was trying to identify which one is best to use. I keep returning to double but others with more experience of c++ might have added detail.
My main point is that, clearly, 1 - 10^-8 is not 1; though it is very nearly one. I was intrigued by the fact that sometimes I get a very long decimal result but here a very definite 1 - and hoping for a way to preserve the integrity of the values.
@lastchance, thank you. The problem is (and the reason for separating out k and x from the dr equation) is that it is precisely these two values that I need to explore through changing. So, fixing the values is not an option.
@jonnin, I had this originally and feel it is the best way forward. I separated out k and x for the reason defined above but since I am the only person changing the code, with anyone else only reading my summary of said code, I think reducing the variable count will help in this case and reduce memory usage. Thank you.
In general, double, which is C and C++'s default, is somewhat less error-prone because programmers are less likely to make errors thanks to accidental implicit conversions and type mismatches. That is to say, double makes consistency easy. This makes it the best choice unless there are overriding concerns about performance. For instance, a programmer might choose float when there are constraints on memory or (more frequently) memory bandwidth.
When it comes to precision, double's obviously more precise. This is typically a minor advantage because most users are far less concerned about precision than about accuracy. Accurate results are obtained by picking a better algorithm, not by using a more precise representation.
Floating point quantization error is measured in units in the last place, or ulps. The ulp is a relative unit: it's magnitude is proportional to a particular floating point number. Because quantization produces relative error, absolute error can be minimized by avoiding operations on values of significantly different magnitude. Consider e.g., William Kahan's algorithm as an example of this principle.
Indeed, when it comes to getting accurate results in a lengthy floating point calculation, precision is rarely a major factor. Any electronics hobbyist can tell you that expensive, precise components won't get you far: circuit design is far more important. In software speak, this means no floating point computation will produce good results if the algorithm is bad. Focus on the algorithm, after which additional precision will do nothing except help reduce the (small) relative error in the results.
To put the 'error' of 1e-8 into perspective it represents a difference of about 45mm in missing the target travelling from a A on the east coast side of the US to another point B on the west coast side.
#include <iostream>
#include <iomanip>
usingnamespace std;
int main()
{
float u = 1.0e-8;
float m = 1 - u;
cout << "As float:\n";
for ( int i = 0; i < 20; i++ ) cout << fixed << setprecision( i ) << m << '\n';
double U = 1.0e-8;
double M = 1 - U;
cout << "\nAs double:\n";
for ( int i = 0; i < 20; i++ ) cout << fixed << setprecision( i ) << M << '\n';
}
It depends on:
(a) the accuracy of your type: float (about 6 sig figs); double (about 14 sig figs)
(b) how you choose to print it out.
You should:
(1) use double;
(2) try to rearrange your code if you can so that you avoid subtracting (or even, adding) two things of very different magnitude.
I would also suggest:
(3) writing u=1.0e-8;
rather than