Strange behaviour with floats and string conversion
I've typed this into python shell:
>>> 0.1*0.1
0.010000000000000002
I expected that 0.1*0.1 is not 0.01, because I know that 0.1 in base 10 is periodic in base 2.
>>> len(str(0.1*0.1))
4
I expected to get 20 as I've seen 20 characters above. Why do I get 4?
>>> str(0.1*0.1)
'0.01'
Ok, this explains why I len
gives me 4, but why does str
return '0.01'
?
>>> repr(0.1*0.1)
'0.010000000000000002'
Why does str
round but repr
not? (I have read this answer, but I would like to know how they have decided when str
rounds a float and when it doesn't)
>>> str(0.01) == str(0.0100000000001)
False
>>> str(0.01) == str(0.01000000000001)
True
So it seems to be a problem with the accuracy of floats. I thought Python would use IEEE 754 single precicion floats. So I've checked it like this:
#include <stdint.h>
#include <stdio.h> // printf
union myUnion {
uint32_t i; // unsigned integer 32-bit type (on every machine)
float f; // a type you want to play with
};
int main() {
union myUnion testVar;
testVar.f = 0.01000000000001f;
printf("%f\n", testVar.f);
testVar.f = 0.01000000000000002f;
printf("%f\n", testVar.f);
testVar.f = 0.01f*0.01f;
printf("%f\n", testVar.f);
}
I got:
0.010000
0.010000
0.000100
Python gives me:
>>> 0.01000000000001
0.010000000000009999
>>> 0.01000000000000002
0.010000000000000019
>>> 0.01*0.01
0.0001
Why does Python give me these results?
(I use Python 2.6.5. If you know of differences in the Python versions, I would also be interested in them.)
Solution 1:
The crucial requirement on repr
is that it should round-trip; that is, eval(repr(f)) == f
should give True
in all cases.
In Python 2.x (before 2.7) repr
works by doing a printf
with format %.17g
and discarding trailing zeroes. This is guaranteed correct (for 64-bit floats) by IEEE-754. Since 2.7 and 3.1, Python uses a more intelligent algorithm that can find shorter representations in some cases where %.17g
gives unnecessary non-zero terminal digits or terminal nines. See What's new in 3.1? and issue 1580.
Even under Python 2.7, repr(0.1 * 0.1)
gives "0.010000000000000002"
. This is because 0.1 * 0.1 == 0.01
is False
under IEEE-754 parsing and arithmetic; that is, the nearest 64-bit floating-point value to 0.1
, when multiplied by itself, yields a 64-bit floating-point value that is not the nearest 64-bit floating-point value to 0.01
:
>>> 0.1.hex()
'0x1.999999999999ap-4'
>>> (0.1 * 0.1).hex()
'0x1.47ae147ae147cp-7'
>>> 0.01.hex()
'0x1.47ae147ae147bp-7'
^ 1 ulp difference
The difference between repr
and str
(pre-2.7/3.1) is that str
formats with 12 decimal places as opposed to 17, which is non-round-trippable but produces more readable results in many cases.
Solution 2:
I can confirm your behaviour
ActivePython 2.6.4.10 (ActiveState Software Inc.) based on
Python 2.6.4 (r264:75706, Jan 22 2010, 17:24:21) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> repr(0.1)
'0.10000000000000001'
>>> repr(0.01)
'0.01'
Now, the docs claim that in Python <2.7
the value of
repr(1.1)
was computed asformat(1.1, '.17g')
This is a slight simplification.
Note that this is all to do with the string formatting code -- in memory, all Python floats are just stored as C++ doubles, so there is never going to be a difference between them.
Also, it's kind of unpleasant to work with the full-length string for a float even if you know that there's a better one. Indeed, in modern Pythons a new algorithm is used for float formatting, that picks the shortest representation in a smart way.
I spent a while looking this up in the source code, so I'll include the details here in case you're interested. You can skip this section.
In floatobject.c
, we see
static PyObject *
float_repr(PyFloatObject *v)
{
char buf[100];
format_float(buf, sizeof(buf), v, PREC_REPR);
return PyString_FromString(buf);
}
which leads us to look at format_float
. Omitting the NaN/inf special cases, it is:
format_float(char *buf, size_t buflen, PyFloatObject *v, int precision)
{
register char *cp;
char format[32];
int i;
/* Subroutine for float_repr and float_print.
We want float numbers to be recognizable as such,
i.e., they should contain a decimal point or an exponent.
However, %g may print the number as an integer;
in such cases, we append ".0" to the string. */
assert(PyFloat_Check(v));
PyOS_snprintf(format, 32, "%%.%ig", precision);
PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);
cp = buf;
if (*cp == '-')
cp++;
for (; *cp != '\0'; cp++) {
/* Any non-digit means it's not an integer;
this takes care of NAN and INF as well. */
if (!isdigit(Py_CHARMASK(*cp)))
break;
}
if (*cp == '\0') {
*cp++ = '.';
*cp++ = '0';
*cp++ = '\0';
return;
}
<some NaN/inf stuff>
}
We can see that
So this first initialises some variables and checks that v
is a well-formed float. It then prepares a format string:
PyOS_snprintf(format, 32, "%%.%ig", precision);
Now PREC_REPR is defined elsewhere in floatobject.c
as 17, so this computes to "%.17g"
. Now we call
PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);
With the end of the tunnel in sight, we look up PyOS_ascii_formatd
and discover that it uses snprintf
internally.
Solution 3:
from python tutorial:
In versions prior to Python 2.7 and Python 3.1, Python rounded this value to 17 significant digits, giving
‘0.10000000000000001’
. In current versions, Python displays a value based on the shortest decimal fraction that rounds correctly back to the true binary value, resulting simply in‘0.1’
.