Understanding inplace=True in pandas
In the pandas
library many times there is an option to change the object inplace such as with the following statement...
df.dropna(axis='index', how='all', inplace=True)
I am curious what is being returned as well as how the object is handled when inplace=True
is passed vs. when inplace=False
.
Are all operations modifying self
when inplace=True
? And when inplace=False
is a new object created immediately such as new_df = self
and then new_df
is returned?
Solution 1:
When inplace=True
is passed, the data is renamed in place (it returns nothing), so you'd use:
df.an_operation(inplace=True)
When inplace=False
is passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use:
df = df.an_operation(inplace=False)
Solution 2:
In pandas, is inplace = True considered harmful, or not?
TLDR; Yes, yes it is.
-
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits -
inplace
does not work with method chaining -
inplace
can lead toSettingWithCopyWarning
if used on a DataFrame column, and may prevent the operation from going though, leading to hard-to-debug errors in code
The pain points above are common pitfalls for beginners, so removing this option will simplify the API.
I don't advise setting this parameter as it serves little purpose. See this GitHub issue which proposes the inplace
argument be deprecated api-wide.
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In reality, there are absolutely no performance benefits to using inplace=True
. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.
inplace=True
is a common pitfall for beginners. For example, it can trigger the SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Calling a function on a DataFrame column with inplace=True
may or may not work. This is especially true when chained indexing is involved.
As if the problems described above aren't enough, inplace=True
also hinders method chaining. Contrast the working of
result = df.some_function1().reset_index().some_function2()
As opposed to
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
The former lends itself to better code organization and readability.
Another supporting claim is that the API for set_axis
was recently changed such that inplace
default value was switched from True to False. See GH27600. Great job devs!