I'm reading a CSV with float numbers like this:
And import into a dataframe, and write this dataframe to a new place
df = pd.read_csv(orig) df.to_csv(pandasfile)
What happen? maybe I have to cast to a different type like float32 or something?
Im using pandas 0.9.0 and numpy 1.6.2.
As mentioned in the comments, it is a general floating point problem.
However you can use the
float_format key word of
to_csv to hide it:
or, if you don't want 0.0001 to be rounded to zero:
will give you:
in your output file.
For an explanation of
%g, see Format Specification Mini-Language.
UPDATE: Answer was accurate at time of writing, and floating point precision is still not something you get by default with to_csv/read_csv (precision-performance tradeoff; defaults favor performance).
The original is still worth reading to get a better grasp on the problem.
It was a bug in pandas, not only in "to_csv" function, but in "read_csv" too. It's not a general floating point issue, despite it's true that floating point arithmetic is a subject which demands some care from the programmer. This article below clarifies a bit this subject:
A classic one-liner which shows the "problem" is ...
>>> 0.1 + 0.1 + 0.1 0.30000000000000004
... which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmetic and only in the last step you employ floating point arithmetic, it will work as you expect. See this:
>>> (1 + 1 + 1) * 1.0 / 10 0.3
If you desperately need to circumvent this problem, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer figures back. Then convert those values to floating point, dividing by the same factor you multiplied before.