float64 with pandas to_csv


Question

I'm reading a CSV with float numbers like this:

Bob,0.085
Alice,0.005

And import into a dataframe, and write this dataframe to a new place

df = pd.read_csv(orig)
df.to_csv(pandasfile)

Now this pandasfile has:

Bob,0.085000000000000006
Alice,0.0050000000000000001

What happen? maybe I have to cast to a different type like float32 or something?

Im using pandas 0.9.0 and numpy 1.6.2.

1
56
10/15/2012 10:14:28 AM

Accepted Answer

As mentioned in the comments, it is a general floating point problem.

However you can use the float_format key word of to_csv to hide it:

df.to_csv('pandasfile.csv', float_format='%.3f')

or, if you don't want 0.0001 to be rounded to zero:

df.to_csv('pandasfile.csv', float_format='%g')

will give you:

Bob,0.085
Alice,0.005

in your output file.

For an explanation of %g, see Format Specification Mini-Language.

108
7/14/2017 7:21:43 PM

UPDATE: Answer was accurate at time of writing, and floating point precision is still not something you get by default with to_csv/read_csv (precision-performance tradeoff; defaults favor performance).

Nowadays there is the float_format argument available for pandas.DataFrame.to_csv and the float_precision argument available for pandas.from_csv.

The original is still worth reading to get a better grasp on the problem.


It was a bug in pandas, not only in "to_csv" function, but in "read_csv" too. It's not a general floating point issue, despite it's true that floating point arithmetic is a subject which demands some care from the programmer. This article below clarifies a bit this subject:

http://docs.python.org/2/tutorial/floatingpoint.html

A classic one-liner which shows the "problem" is ...

>>> 0.1 + 0.1 + 0.1
0.30000000000000004

... which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmetic and only in the last step you employ floating point arithmetic, it will work as you expect. See this:

>>> (1 + 1 + 1)  * 1.0 / 10
0.3

If you desperately need to circumvent this problem, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer figures back. Then convert those values to floating point, dividing by the same factor you multiplied before.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon