Retain precision with double in Java

public class doublePrecision {
    public static void main(String[] args) {

        double total = 0;
        total += 5.6;
        total += 5.8;
        System.out.println(total);
    }
}

The above code prints:

11.399999999999

How would I get this to just print (or be able to use it as) 11.4?


Solution 1:

As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.

Now, a little explanation into why this is happening:

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.

More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:

  • 1 bit denotes the sign (positive or negative).
  • 11 bits for the exponent.
  • 52 bits for the significant digits (the fractional part as a binary).

These parts are combined to produce a double representation of a value.

(Source: Wikipedia: Double precision)

For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.

The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.

When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.

As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.

From the Java API Reference for the BigDecimal class:

Immutable, arbitrary-precision signed decimal numbers. A BigDecimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale. If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. The value of the number represented by the BigDecimal is therefore (unscaledValue × 10^-scale).

There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:

  • Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
  • How to print really big numbers in C++
  • How is floating point stored? When does it matter?
  • Use Float or Decimal for Accounting Application Dollar Amount?

If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Solution 2:

When you input a double number, for example, 33.33333333333333, the value you get is actually the closest representable double-precision value, which is exactly:

33.3333333333333285963817615993320941925048828125

Dividing that by 100 gives:

0.333333333333333285963817615993320941925048828125

which also isn't representable as a double-precision number, so again it is rounded to the nearest representable value, which is exactly:

0.3333333333333332593184650249895639717578887939453125

When you print this value out, it gets rounded yet again to 17 decimal digits, giving:

0.33333333333333326

Solution 3:

If you just want to process values as fractions, you can create a Fraction class which holds a numerator and denominator field.

Write methods for add, subtract, multiply and divide as well as a toDouble method. This way you can avoid floats during calculations.

EDIT: Quick implementation,

public class Fraction {

private int numerator;
private int denominator;

public Fraction(int n, int d){
    numerator = n;
    denominator = d;
}

public double toDouble(){
    return ((double)numerator)/((double)denominator);
}


public static Fraction add(Fraction a, Fraction b){
    if(a.denominator != b.denominator){
        double aTop = b.denominator * a.numerator;
        double bTop = a.denominator * b.numerator;
        return new Fraction(aTop + bTop, a.denominator * b.denominator);
    }
    else{
        return new Fraction(a.numerator + b.numerator, a.denominator);
    }
}

public static Fraction divide(Fraction a, Fraction b){
    return new Fraction(a.numerator * b.denominator, a.denominator * b.numerator);
}

public static Fraction multiply(Fraction a, Fraction b){
    return new Fraction(a.numerator * b.numerator, a.denominator * b.denominator);
}

public static Fraction subtract(Fraction a, Fraction b){
    if(a.denominator != b.denominator){
        double aTop = b.denominator * a.numerator;
        double bTop = a.denominator * b.numerator;
        return new Fraction(aTop-bTop, a.denominator*b.denominator);
    }
    else{
        return new Fraction(a.numerator - b.numerator, a.denominator);
    }
}

}