How to drop a list of rows from Pandas dataframe?
I have a dataframe df :
>>> df
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20060630 6.590 NaN 6.590 5.291
20060930 10.103 NaN 10.103 7.981
20061231 15.915 NaN 15.915 12.686
20070331 3.196 NaN 3.196 2.710
20070630 7.907 NaN 7.907 6.459
Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4],
then left:
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20061231 15.915 NaN 15.915 12.686
20070630 7.907 NaN 7.907 6.459
How or what function can do that ?
Use DataFrame.drop and pass it a Series of index labels:
In [65]: df
Out[65]:
one two
one 1 4
two 2 3
three 3 2
four 4 1
In [66]: df.drop(df.index[[1,3]])
Out[66]:
one two
one 1 4
three 3 2
Note that it may be important to use the "inplace" command when you want to do the drop in line.
df.drop(df.index[[1,3]], inplace=True)
Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[])
takes too much time.
In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols
, and I need to remove 10k
rows from it. The fastest method I found is, quite counterintuitively, to take
the remaining rows.
Let indexes_to_drop
be an array of positional indexes to drop ([1, 2, 4]
in the question).
indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))
In my case this took 20.5s
, while the simple df.drop
took 5min 27s
and consumed a lot of memory. The resulting DataFrame is the same.