reading file with missing values in python pandas


Question

I try to read .txt with missing values using pandas.read_csv. My data is of the format:

10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301
10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686

with thousands of samples with same name of the point, gps position, and other readings. I use a code:

myData = read_csv('~/data.txt', sep=',', na_values='')

The code is wrong as na_values does not gives NaN or other indicator. Columns should have the same size but I finish with different length.

I don't know what exactly should be typed in after na_values (did try all different things). Thanks

1
10
9/20/2012 3:32:51 PM

Accepted Answer

The parameter na_values must be "list like" (see this answer).

A string is "list like" so:

na_values='abc' # would transform the letters 'a', 'b' and 'c' each into `nan`
# is equivalent to
na_values=['a','b','c']`

Similarly:

na_values=''
# is equivalent to
na_values=[] # and this is not what you want!

This means that you need to use na_values=[''].

12
5/23/2017 12:00:10 PM

What version of pandas are you on? Interpreting empty string as NaN is the default behavior for pandas and seem to parse the empty strings fine in your data snippet both in v0.7.3 and current master without using the na_values parameter at all.

In [10]: data = """\
10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301
10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686
"""

In [11]: read_csv(StringIO(data), header=None).T
Out[11]: 
                   0           1
X.1       10/08/2012  10/08/2012
X.2         12:10:10    12:10:11
X.3            name1       name2
X.4             0.81         NaN
X.5             4.02         NaN
X.6   50;18.5701400N         NaN
X.7    4;07.7693770E         NaN
X.8             7.92       10.87
X.9             10.5         1.4
X.10          0.0106      0.0099
X.11             4.3         9.7
X.12          0.0301      0.0686

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon