Assume I have a pandas DataFrame with two columns, A and B. I'd like to modify this DataFrame (or create a copy) so that B is always NaN whenever A is 0. How would I achieve that?
I tried the following
df['A'==0]['B'] = np.nan
.loc for label based indexing:
df.loc[df.A==0, 'B'] = np.nan
df.A==0 expression creates a boolean series that indexes the rows,
'B' selects the column. You can also use this to transform a subset of a column, e.g.:
df.loc[df.A==0, 'B'] = df.loc[df.A==0, 'B'] / 2
I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.
Here is from pandas docs on advanced indexing:
The section will explain exactly what you need! Turns out
df.loc (as .ix has been deprecated -- as many have pointed out below) can be used for cool slicing/dicing of a dataframe. And. It can also be used to set things.
df.loc[selection criteria, columns I want] = value
So Bren's answer is saying 'find me all the places where
df.A == 0, select column
B and set it to