Pandas read csv file with float values results in weird rounding and decimal digits
I have a csv file containing numerical values such as 1524.449677
. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv
, the column automatically gets the datatype object
. My issue is that the values are shown as 2470.6911370000003
which actually should be 2470.691137
. Or the value 2484.30691
is shown as 2484.3069100000002
.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv
by giving the dtype
argument as {'columnname': np.float64}
. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin
converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip'
to read_csv
fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf"
to the corresponding method.
A full exemple:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision
option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv
arguments, define dtype='str'
and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64')
.
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.