Given this CSV file:
"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12
I simply want to load it as a matrix/ndarray with 3 rows and 7 columns. However, for some reason, all I can get out of numpy is an ndarray with 3 rows (one per line) and no columns.
r = np.genfromtxt(fname,delimiter=',',dtype=None, names=True)
print r
print r.shape
[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
(611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
(611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)
I can manually iterate and hack it into the shape I want, but this seems silly. I just want to load it as a proper matrix so I can slice it across different dimensions and plot it, just like in matlab.
Pure numpy
numpy.loadtxt(open("test.csv", "rb"), delimiter=",", skiprows=1)
Check out the loadtxt documentation.
You can also use python's csv module:
import csv
import numpy
reader = csv.reader(open("test.csv", "rb"), delimiter=",")
x = list(reader)
result = numpy.array(x).astype("float")
You will have to convert it to your favorite numeric type. I guess you can write the whole thing in one line:
result = numpy.array(list(csv.reader(open("test.csv", "rb"), delimiter=","))).astype("float")
Added Hint:
You could also use pandas.io.parsers.read_csv
and get the associated numpy
array which can be faster.
I think using dtype
where there is a name row is confusing the routine. Try
>>> r = np.genfromtxt(fname, delimiter=',', names=True)
>>> r
array([[ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03,
8.64075140e+02, 1.71537476e+03, 7.65227770e+02,
1.29111196e+12],
[ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03,
8.64075140e+02, 1.71537476e+03, 7.65227770e+02,
1.29111311e+12],
[ 6.11882430e+02, 9.08956010e+03, 5.13300000e+03,
8.64075140e+02, 1.71537476e+03, 7.65227770e+02,
1.29112065e+12]])
>>> r[:,0] # Slice 0'th column
array([ 611.88243, 611.88243, 611.88243])