How do I read CSV data into a record array in NumPy?


Question

I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame?

Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?

1
363
7/15/2018 8:25:30 AM

Accepted Answer

You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.

547
3/2/2012 3:05:49 PM

I would recommend the read_csv function from the pandas library:

import pandas as pd
df=pd.read_csv('myfile.csv', sep=',',header=None)
df.values
array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...


I would also recommend genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the dtype=None parameter needs to be added to the genfromtxt call:

Given an input file, myfile.csv:

1.0, 2, 3
4, 5.5, 6

import numpy as np
np.genfromtxt('myfile.csv',delimiter=',')

gives an array:

array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

and

np.genfromtxt('myfile.csv',delimiter=',',dtype=None)

gives a record array:

array([(1.0, 2.0, 3), (4.0, 5.5, 6)], 
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

This has the advantage that file with multiple data types (including strings) can be easily imported.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon