Floating point equality and tolerances
Comparing two floating point number by something like a_float == b_float
is looking for trouble since a_float / 3.0 * 3.0
might not be equal to a_float
due to round off error.
What one normally does is something like fabs(a_float - b_float) < tol
.
How does one calculate tol
?
Ideally tolerance should be just larger than the value of one or two of the least significant figures. So if the single precision floating point number is use tol = 10E-6
should be about right. However this does not work well for the general case where a_float
might be very small or might be very large.
How does one calculate tol
correctly for all general cases? I am interested in C or C++ cases specifically.
This blogpost contains an example, fairly foolproof implementation, and detailed theory behind it http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ it is also one of a series, so you can always read more. In short: use ULP for most numbers, use epsilon for numbers near zero, but there are still caveats. If you want to be sure about your floating point math i recommend reading whole series.
As far as I know, one doesn't.
There is no general "right answer", since it can depend on the application's requirement for precision.
For instance, a 2D physics simulation working in screen-pixels might decide that 1/4 of a pixel is good enough, while a 3D CAD system used to design nuclear plant internals might not.
I can't see a way to programmatically decide this from the outside.
The C header file <float.h>
gives you the constants FLT_EPSILON
and DBL_EPSILON
, which is the difference between 1.0 and the smallest number larger than 1.0 that a float/double can represent. You can scale that by the size of your numbers and the rounding error you wish to tolerate:
#include <float.h>
#ifndef DBL_TRUE_MIN
/* DBL_TRUE_MIN is a common non-standard extension for the minimum denorm value
* DBL_MIN is the minimum non-denorm value -- use that if TRUE_MIN is not defined */
#define DBL_TRUE_MIN DBL_MIN
#endif
/* return the difference between |x| and the next larger representable double */
double dbl_epsilon(double x) {
int exp;
if (frexp(x, &exp) == 0.0)
return DBL_TRUE_MIN;
return ldexp(DBL_EPSILON, exp-1);
}