Understanding inplace=True in pandas

In the pandas library many times there is an option to change the object inplace such as with the following statement...

df.dropna(axis='index', how='all', inplace=True)

I am curious what is being returned as well as how the object is handled when inplace=True is passed vs. when inplace=False.

Are all operations modifying self when inplace=True? And when inplace=False is a new object created immediately such as new_df = self and then new_df is returned?

Solution 1:

When inplace=True is passed, the data is renamed in place (it returns nothing), so you'd use:


When inplace=False is passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use:

df = df.an_operation(inplace=False) 

Solution 2:

In pandas, is inplace = True considered harmful, or not?

TLDR; Yes, yes it is.

  • inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
  • inplace does not work with method chaining
  • inplace can lead to SettingWithCopyWarning if used on a DataFrame column, and may prevent the operation from going though, leading to hard-to-debug errors in code

The pain points above are common pitfalls for beginners, so removing this option will simplify the API.

I don't advise setting this parameter as it serves little purpose. See this GitHub issue which proposes the inplace argument be deprecated api-wide.

It is a common misconception that using inplace=True will lead to more efficient or optimized code. In reality, there are absolutely no performance benefits to using inplace=True. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.

inplace=True is a common pitfall for beginners. For example, it can trigger the SettingWithCopyWarning:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

Calling a function on a DataFrame column with inplace=True may or may not work. This is especially true when chained indexing is involved.

As if the problems described above aren't enough, inplace=True also hinders method chaining. Contrast the working of

result = df.some_function1().reset_index().some_function2()

As opposed to

temp = df.some_function1()
result = temp.some_function2()

The former lends itself to better code organization and readability.

Another supporting claim is that the API for set_axis was recently changed such that inplace default value was switched from True to False. See GH27600. Great job devs!