When processing CSV data, how do I ignore the first line of data?


Question

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?

This is the code so far:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

1
98
8/28/2014 3:21:52 PM

Accepted Answer

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

    print(least_value)

Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

    data = (float(row[1]) for row in reader)

Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

with open('all16.csv', 'rb') as file:
99
12/9/2017 5:23:18 AM

To skip the first line just call:

next(inf)

Files in Python are iterators over lines.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon