Remove rows not .isin('X') [duplicate]
Sorry just getting into Pandas, this seems like it should be a very straight forward question. How can I use the isin('X')
to remove rows that are in the list X
? In R I would write !which(a %in% b)
.
Solution 1:
You have many options. Collating some of the answers above and the accepted answer from this post you can do:
1. df[-df["column"].isin(["value"])]
2. df[~df["column"].isin(["value"])]
3. df[df["column"].isin(["value"]) == False]
4. df[np.logical_not(df["column"].isin(["value"]))]
Note: for option 4 for you'll need to import numpy as np
Update: You can also use the .query
method for this too. This allows for method chaining:
5. df.query("column not in @values")
.
where values
is a list of the values that you don't want to include.
Solution 2:
You can use numpy.logical_not
to invert the boolean array returned by isin
:
In [63]: s = pd.Series(np.arange(10.0))
In [64]: x = range(4, 8)
In [65]: mask = np.logical_not(s.isin(x))
In [66]: s[mask]
Out[66]:
0 0
1 1
2 2
3 3
8 8
9 9
As given in the comment by Wes McKinney you can also use
s[~s.isin(x)]
Solution 3:
All you have to do is create a subset of your dataframe where the isin method evaluates to False:
df = df[df['Column Name'].isin(['Value']) == False]
Solution 4:
You can use the DataFrame.select
method:
In [1]: df = pd.DataFrame([[1,2],[3,4]], index=['A','B'])
In [2]: df
Out[2]:
0 1
A 1 2
B 3 4
In [3]: L = ['A']
In [4]: df.select(lambda x: x in L)
Out[4]:
0 1
A 1 2