Delete a column from a Pandas DataFrame
Solution 1:
The best way to do this in Pandas is to use drop
:
df = df.drop('column_name', 1)
where 1
is the axis number (0
for rows and 1
for columns.)
To delete the column without having to reassign df
you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:
df = df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Also working with "text" syntax for the columns:
df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
Note: Introduced in v0.21.0 (October 27, 2017), the drop() method accepts index/columns keywords as an alternative to specifying the axis.
So we can now just do:
df = df.drop(columns=['column_nameA', 'column_nameB'])
Solution 2:
As you've guessed, the right syntax is
del df['column_name']
It's difficult to make del df.column_name
work simply as the result of syntactic limitations in Python. del df[name]
gets translated to df.__delitem__(name)
under the covers by Python.
Solution 3:
Use:
columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)
This will delete one or more columns in-place. Note that inplace=True
was added in pandas v0.13 and won't work on older versions. You'd have to assign the result back in that case:
df = df.drop(columns, axis=1)
Solution 4:
Drop by index
Delete first, second and fourth columns:
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
Delete first column:
df.drop(df.columns[[0]], axis=1, inplace=True)
There is an optional parameter inplace
so that the original
data can be modified without creating a copy.
Popped
Column selection, addition, deletion
Delete column column-name
:
df.pop('column-name')
Examples:
df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])
print df
:
one two three
A 1 2 3
B 4 5 6
C 7 8 9
df.drop(df.columns[[0]], axis=1, inplace=True)
print df
:
two three
A 2 3
B 5 6
C 8 9
three = df.pop('three')
print df
:
two
A 2
B 5
C 8