float64 with pandas to_csv
I'm reading a CSV with float numbers like this:
Bob,0.085
Alice,0.005
And import into a dataframe, and write this dataframe to a new place
df = pd.read_csv(orig)
df.to_csv(pandasfile)
Now this pandasfile
has:
Bob,0.085000000000000006
Alice,0.0050000000000000001
What happen? maybe I have to cast to a different type like float32 or something?
Im using pandas 0.9.0 and numpy 1.6.2.
As mentioned in the comments, it is a general floating point problem.
However you can use the float_format
key word of to_csv
to hide it:
df.to_csv('pandasfile.csv', float_format='%.3f')
or, if you don't want 0.0001 to be rounded to zero:
df.to_csv('pandasfile.csv', float_format='%g')
will give you:
Bob,0.085
Alice,0.005
in your output file.
For an explanation of %g
, see Format Specification Mini-Language.
UPDATE: Answer was accurate at time of writing, and floating point precision is still not something you get by default with to_csv/read_csv (precision-performance tradeoff; defaults favor performance).
Nowadays there is the float_format
argument available for pandas.DataFrame.to_csv
and the float_precision
argument available for pandas.from_csv
.
The original is still worth reading to get a better grasp on the problem.
It was a bug in pandas, not only in "to_csv" function, but in "read_csv" too. It's not a general floating point issue, despite it's true that floating point arithmetic is a subject which demands some care from the programmer. This article below clarifies a bit this subject:
http://docs.python.org/2/tutorial/floatingpoint.html
A classic one-liner which shows the "problem" is ...
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
... which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmetic and only in the last step you employ floating point arithmetic, it will work as you expect. See this:
>>> (1 + 1 + 1) * 1.0 / 10
0.3
If you desperately need to circumvent this problem, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer figures back. Then convert those values to floating point, dividing by the same factor you multiplied before.