Floating point equality and tolerances

Comparing two floating point number by something like a_float == b_float is looking for trouble since a_float / 3.0 * 3.0 might not be equal to a_float due to round off error.

What one normally does is something like fabs(a_float - b_float) < tol.

How does one calculate tol?

Ideally tolerance should be just larger than the value of one or two of the least significant figures. So if the single precision floating point number is use tol = 10E-6 should be about right. However this does not work well for the general case where a_float might be very small or might be very large.

How does one calculate tol correctly for all general cases? I am interested in C or C++ cases specifically.


This blogpost contains an example, fairly foolproof implementation, and detailed theory behind it http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ it is also one of a series, so you can always read more. In short: use ULP for most numbers, use epsilon for numbers near zero, but there are still caveats. If you want to be sure about your floating point math i recommend reading whole series.


As far as I know, one doesn't.

There is no general "right answer", since it can depend on the application's requirement for precision.

For instance, a 2D physics simulation working in screen-pixels might decide that 1/4 of a pixel is good enough, while a 3D CAD system used to design nuclear plant internals might not.

I can't see a way to programmatically decide this from the outside.


The C header file <float.h> gives you the constants FLT_EPSILON and DBL_EPSILON, which is the difference between 1.0 and the smallest number larger than 1.0 that a float/double can represent. You can scale that by the size of your numbers and the rounding error you wish to tolerate:

#include <float.h>
#ifndef DBL_TRUE_MIN
/* DBL_TRUE_MIN is a common non-standard extension for the minimum denorm value
 * DBL_MIN is the minimum non-denorm value -- use that if TRUE_MIN is not defined */
#define DBL_TRUE_MIN DBL_MIN
#endif

/* return the difference between |x| and the next larger representable double */
double dbl_epsilon(double x) {
    int exp;
    if (frexp(x, &exp) == 0.0)
        return DBL_TRUE_MIN;
    return ldexp(DBL_EPSILON, exp-1);
}