In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?


Question

My table:

In [15]: csv=u"""a,a,,a
   ....: b,b,,b
   ....: c,c,,c
   ....: """

In [18]: df = pd.read_csv(io.StringIO(csv), header=None)

Fill the empty columns as 'UNKNOWN'

In [19]: df
Out[19]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [20]: df.fillna({2:'UNKNOWN'})

Got the error

ValueError: could not convert string to float: UNKNOWN
1
10
4/17/2013 6:10:46 PM

Accepted Answer

Your 2 column probably has a float dtype:

>>> df
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c
>>> df.dtypes
0     object
1     object
2    float64
3     object
dtype: object

Hence the problem. If you don't mind converting the whole frame to object, you could:

>>> df.astype(object).fillna("UNKNOWN")
   0  1        2  3
0  a  a  UNKNOWN  a
1  b  b  UNKNOWN  b
2  c  c  UNKNOWN  c

Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.


Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.dtypes
0      int64
1     object
2     object
3    float64
4     object
5    float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
...         df[col] = df[col].astype(object).fillna("UNKNOWN")
...     
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
>>> df.dtypes
0     int64
1    object
2    object
3    object
4    object
5    object
dtype: object

Or (if you're using all), then maybe not even use fillna at all:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
9
4/17/2013 6:41:31 PM

As a workaround, just set the column directly, the fillna upconversion should work and is a bug

In [8]: df = pd.read_csv(io.StringIO(csv), header=None)

In [9]: df
Out[9]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [10]: df.loc[:,2] = 'foo'

In [11]: df
Out[11]: 
   0  1    2  3
0  a  a  foo  a
1  b  b  foo  b
2  c  c  foo  c

In [12]: df.dtypes
Out[12]: 
0    object
1    object
2    object
3    object
dtype: object

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon