Modifying a subset of rows in a pandas dataframe

StackOverflow

Assume I have a pandas DataFrame with two columns, A and B. I"d like to modify this DataFrame (or create a copy) so that B is always NaN whenever A is 0. How would I achieve that?

I tried the following

df["A"==0]["B"] = np.nan

and

df["A"==0]["B"].values.fill(np.nan)

without success.

Answer rating: 277

Use .loc for label based indexing:

df.loc[df.A==0, "B"] = np.nan

The df.A==0 expression creates a boolean series that indexes the rows, "B" selects the column. You can also use this to transform a subset of a column, e.g.:

df.loc[df.A==0, "B"] = df.loc[df.A==0, "B"] / 2

I don"t know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I"ve found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.





Get Solution for free from DataCamp guru