How to drop a list of rows from Pandas dataframe?

I have a dataframe df :

>>> df
                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20060630   6.590       NaN      6.590   5.291
       20060930  10.103       NaN     10.103   7.981
       20061231  15.915       NaN     15.915  12.686
       20070331   3.196       NaN      3.196   2.710
       20070630   7.907       NaN      7.907   6.459

Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4], then left:

                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20061231  15.915       NaN     15.915  12.686
       20070630   7.907       NaN      7.907   6.459

How or what function can do that ?

Use DataFrame.drop and pass it a Series of index labels:

In [65]: df
Out[65]: 
       one  two
one      1    4
two      2    3
three    3    2
four     4    1


In [66]: df.drop(df.index[[1,3]])
Out[66]: 
       one  two
one      1    4
three    3    2

Note that it may be important to use the "inplace" command when you want to do the drop in line.

df.drop(df.index[[1,3]], inplace=True)

Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[]) takes too much time.

In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols, and I need to remove 10k rows from it. The fastest method I found is, quite counterintuitively, to take the remaining rows.

Let indexes_to_drop be an array of positional indexes to drop ([1, 2, 4] in the question).

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

In my case this took 20.5s, while the simple df.drop took 5min 27s and consumed a lot of memory. The resulting DataFrame is the same.

How to drop a list of rows from Pandas dataframe?

Related

Recent Posts