Pandas: Appending a row to a dataframe and specify its index label


Question

Is there any way to specify the index that I want for a new row, when appending the row to a dataframe?

The original documentation provides the following example:

In [1301]: df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])

In [1302]: df
Out[1302]: 
          A         B         C         D
0 -1.137707 -0.891060 -0.693921  1.613616
1  0.464000  0.227371 -0.496922  0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3  0.281957  1.523962 -0.902937  0.068159
4 -0.057873 -0.368204 -1.144073  0.861209
5  0.800193  0.782098 -1.069094 -1.099248
6  0.255269  0.009750  0.661084  0.379319
7 -0.008434  1.952541 -1.056652  0.533946

In [1303]: s = df.xs(3)

In [1304]: df.append(s, ignore_index=True)
Out[1304]: 
          A         B         C         D
0 -1.137707 -0.891060 -0.693921  1.613616
1  0.464000  0.227371 -0.496922  0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3  0.281957  1.523962 -0.902937  0.068159
4 -0.057873 -0.368204 -1.144073  0.861209
5  0.800193  0.782098 -1.069094 -1.099248
6  0.255269  0.009750  0.661084  0.379319
7 -0.008434  1.952541 -1.056652  0.533946
8  0.281957  1.523962 -0.902937  0.068159

where the new row gets the index label automatically. Is there any way to control the new label?

1
52
5/29/2013 10:46:49 PM

Accepted Answer

The name of the Series becomes the index of the row in the DataFrame:

In [99]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])

In [100]: s = df.xs(3)

In [101]: s.name = 10

In [102]: df.append(s)
Out[102]: 
           A         B         C         D
0  -2.083321 -0.153749  0.174436  1.081056
1  -1.026692  1.495850 -0.025245 -0.171046
2   0.072272  1.218376  1.433281  0.747815
3  -0.940552  0.853073 -0.134842 -0.277135
4   0.478302 -0.599752 -0.080577  0.468618
5   2.609004 -1.679299 -1.593016  1.172298
6  -0.201605  0.406925  1.983177  0.012030
7   1.158530 -2.240124  0.851323 -0.240378
10 -0.940552  0.853073 -0.134842 -0.277135
46
5/29/2013 10:05:55 PM

df.loc will do the job :

>>> df = pd.DataFrame(np.random.randn(3, 2), columns=['A','B'])
>>> df
          A         B
0 -0.269036  0.534991
1  0.069915 -1.173594
2 -1.177792  0.018381
>>> df.loc[13] = df.loc[1]
>>> df
           A         B
0  -0.269036  0.534991
1   0.069915 -1.173594
2  -1.177792  0.018381
13  0.069915 -1.173594

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon