How to iterate over rows in a DataFrame in Pandas?


Question

I have a DataFrame from pandas:

import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df

Output:

   c1   c2
0  10  100
1  11  110
2  12  120

Now I want to iterate over the rows of this frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. For example:

for row in df.rows:
   print row['c1'], row['c2']

Is it possible to do that in pandas?

I found this similar question. But it does not give me the answer I need. For example, it is suggested there to use:

for date, row in df.T.iteritems():

or

for row in df.iterrows():

But I do not understand what the row object is and how I can work with it.

1
1370
8/24/2018 7:20:36 PM

Accepted Answer

DataFrame.iterrows is a generator which yield both index and row

import pandas as pd
import numpy as np

df = pd.DataFrame([{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}])

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

Output: 
   10 100
   11 110
   12 120
1919
6/21/2019 3:26:43 AM

First consider if you really need to iterate over rows in a DataFrame. See this answer for alternatives.

If you still need to iterate over rows, you can use methods below. Note some important caveats which are not mentioned in any of the other answers.

itertuples() is supposed to be faster than iterrows()

But be aware, according to the docs (pandas 0.24.2 at the moment):

  • iterrows: dtype might not match from row to row

    Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()

  • iterrows: Do not modify rows

    You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

    Use DataFrame.apply() instead:

    new_df = df.apply(lambda x: x * 2)
    
  • itertuples:

    The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.

See pandas docs on iteration for more details.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon