Pandas how can 'replace' work after 'loc'?

I have tried many times, but seems the 'replace' can NOT work well after use 'loc'. For example I want to replace the 'conlumn_b' with an regex for the row that the 'conlumn_a' value is 'apple'.

Here is my sample code :

df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'].replace(r'^11*', 'XXX',inplace=True, regex=True)

Example:

conlumn_a       conlumn_b
apple           123
banana          11
apple           11
orange          33

The result that I expected for the 'df' is:

conlumn_a       conlumn_b
apple           123
banana          11
apple           XXX
orange          33

Anyone has meet this issue that needs 'replace' with regex after 'loc' ?

OR you guys has some other good solutions ?

Thank you so much for your help!


Solution 1:

inplace=True works on the object that it was applied on.

When you call .loc, you're slicing your dataframe object to return a new one.

>>> id(df)
4587248608

And,

>>> id(df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'])
4767716968

Now, calling an in-place replace on this new slice will apply the replace operation, updating the new slice itself, and not the original.


Now, note that you're calling replace on a column of int, and nothing is going to happen, because regular expressions work on strings.

Here's what I offer you as a workaround. Don't use regex at all.

m = df['conlumn_a'] == 'apple'
df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b'].replace(11, 'XXX')

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Or, if you need regex based substitution, then -

df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b']\
           .astype(str).replace('^11$', 'XXX', regex=True)

Although, this converts your column to an object column.

Solution 2:

I'm going to borrow from a recent answer of mine. This technique is a general purpose strategy for updating a dataframe in place:

df.update(
    df.loc[df['conlumn_a'] == 'apple', 'conlumn_b']
      .replace(r'^11$', 'XXX', regex=True)
)

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Note that all I did was remove the inplace=True and instead wrapped it in the pd.DataFrame.update method.