Create a boolean column in pandas datafame based on percentile values of another column
I have a dataframe with multiple columns. I want to create boolean column, flagging if the value belongs to 90th percentile and above.
My data frame also contains multiple zeros.
Example:
Name Value
Val1 1000
Val2 910
Val3 800
Val4 700
Val5 600
Val6 500
Val7 400
Val8 300
Val9 200
Val10 100
Val11 0
Expected output
Name Value 90thper
Val1 1000 1
Val2 910 1
Val3 800 0
Val4 700 0
Val5 600 0
Val6 500 0
Val7 400 0
Val8 300 0
Val9 200 0
Val10 100 0
Val11 0 0
Solution 1:
You could use pd.Series.quantile
to find the 90th percentile value and include all values above it.
val = df['Value'].quantile(.9, interpolation="lower") # val -> 910
df['90thper'] = df['Value'].ge(val).astype(int)
# Name Value 90thper
# 0 Val1 1000 1
# 1 Val2 910 1
# 2 Val3 800 0
# 3 Val4 700 0
# 4 Val5 600 0
# 5 Val6 500 0
# 6 Val7 400 0
# 7 Val8 300 0
# 8 Val9 200 0
# 9 Val10 100 0
Solution 2:
Try quantile
df['new'] = df.Value.ge(df.Value.quantile(0.9)).astype(int)
df
Name Value new
0 Val1 1000 1
1 Val2 910 1
2 Val3 800 0
3 Val4 700 0
4 Val5 600 0
5 Val6 500 0
6 Val7 400 0
7 Val8 300 0
8 Val9 200 0
9 Val10 100 0
10 Val11 0 0