df.drop if it exists

Solution 1:

Add parameter errors to DataFrame.drop:

errors : {'ignore', 'raise'}, default 'raise'

If 'ignore', suppress error and only existing labels are dropped.

df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')

Sample:

df = pd.DataFrame({'row_num':[1,2], 'w':[3,4]})
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')
print (df)
   w
0  3
1  4

Solution 2:

In my tests the following was at least as fast as any of the given answers:

candidates=['row_num','start_date','end_date','symbol']
df = df.drop([x for x in candidates if x in df.columns], axis=1)

It has the benefit of readability and (with a small tweak to the code) the ability to record exactly which columns existed/were dropped when.

Some reasons this might be more desireable than the previous solutions:

  • Looping over the items and dropping each column individually if it exists is functional, but quite slow (see benchmarks below).
  • jezrael's answer is very nice, but made me nervous at first (ignoring errors feels bad!). Further looking at the documentation makes it sounds like this is OK though, and simply ignores the error of the column not existing (not other errors that might be undesireable to ignore). My solution may be more readable, especially for those less familiar with optional kwargs in pandas.

Benchmark Results:

![benchmark results

Code for benchmark tests (credit to an answer in this question for how to create this sort of benchmark):

import math
from simple_benchmark import benchmark
import pandas as pd

# setting up the toy df:
def df_creator(length):
    c1=list(range(0,10))
    c2=list('a,b,c,d,e'.split(','))
    c3=list(range(0,5))
    c4=[True,False]
    lists=[c1,c2,c3,c4]
    df=pd.DataFrame()
    count=0
    for x in lists:
        count+=1
        df['col'+str(count)]=x*math.floor(length/len(x))
    return df

# setting up benchmark test:
def list_comp(df,candidates=['col1','col2','col5','col8']):
    return df.drop([x for x in candidates if x in df.columns], axis=1)

def looper(df,candidates=['col1','col2','col5','col8']):
    for col in candidates:
        if col in df.columns:
            out = df.drop(columns=col, axis=1)
    return out

def ignore_error(df,candidates=['col1','col2','col5','col8']):
    return df.drop(candidates, axis=1, errors='ignore')

functions=[list_comp,looper,ignore_error]

args={n : df_creator(n) for n in [10,100,1000,10000,100000]}
argname='df_length'
b=benchmark(functions,args,argname)
b.plot()

Solution 3:

Just use Pandas Filter, the Pythonic Way

Oddly, No answers use the pandas dataframe filter method

thisFilter = df.filter(drop_list)
df.drop(thisFilter, inplace=True, axis=1)

This will create a filter from the drop_list that exists in df, then drop thisFilter from the df inplace on axis=1

i.e., drop the columns that match the drop_list and don't error if they are nonexistent