Pandas How to filter a Series
I have a Series like this after doing groupby('name') and used mean() function on other column
name
383 3.000000
663 1.000000
726 1.000000
737 9.000000
833 8.166667
Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.
Solution 1:
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
Solution 2:
From pandas version 0.18+ filtering a series can also be done as below
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
Solution 3:
As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.
Rather than using .where
, you can pass your function to either the .loc
indexer or the Series indexer []
and avoid the call to .dropna
:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.loc[lambda x : x!=1]
test[lambda x: x!=1]
Similar behavior is supported on the DataFrame and NDFrame classes.
Solution 4:
A fast way of doing this is to reconstruct using numpy
to slice the underlying arrays. See timings below.
mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])
0
383 3.000000
737 9.000000
833 8.166667
dtype: float64
naive timing