How to efficiently handle European decimal separators using the pandas read_csv function?

Solution 1:

For European style numbers, use the thousands and decimal parameters in pandas.read_csv.

For example:

pandas.read_csv('data.csv', thousands='.', decimal=',')

From the docs:

thousands :

str, optional Thousands separator.

decimal :

str, default ‘.’ Character to recognize as decimal point (e.g. use ‘,’ for European data).

Solution 2:

You can use the converters kw in read_csv. Given /tmp/data.csv like this:

"x","y"                                                                         
"one","1.234,56"                                                                
"two","2.000,00"   

you can do:

In [20]: pandas.read_csv('/tmp/data.csv', converters={'y': lambda x: float(x.replace('.','').replace(',','.'))})
Out[20]: 
     x        y
0  one  1234.56
1  two  2000.00