NumPy or Pandas: Keeping array type as integer while having a NaN value


Question

Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.NaN?

In particular, I am converting an in-house data structure to a Pandas DataFrame. In our structure, we have integer-type columns that still have NaN's (but the dtype of the column is int). It seems to recast everything as a float if we make this a DataFrame, but we'd really like to be int.

Thoughts?

Things tried:

I tried using the from_records() function under pandas.DataFrame, with coerce_float=False and this did not help. I also tried using NumPy masked arrays, with NaN fill_value, which also did not work. All of these caused the column data type to become a float.

1
127
1/5/2017 12:00:02 AM

Accepted Answer

This capability has been added to pandas (beginning with version 0.24): https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support

At this point, it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lowercase).

46
1/29/2019 2:26:02 PM

NaN can't be stored in an integer array. This is a known limitation of pandas at the moment; I have been waiting for progress to be made with NA values in NumPy (similar to NAs in R), but it will be at least 6 months to a year before NumPy gets these features, it seems:

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

(This feature has been added beginning with version 0.24 of pandas, but note it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lower case): https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support )


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon