Keep only those columns in dataframe based on min value of each row
I have a datafrme
In = pd.DataFrame([["W",13,23,45,65], ["X",23,45,12,78], ["Y",12,34,56,89]],columns=["A","B","C","D","E"])
W row has min value 13, X row has min value 12, and Y row has min value 12. Keep only those columns that have min value of all the rows.
Expected output:
Out = pd.DataFrame([["W",13,45], ["X",23,12], ["Y",12,56]],columns=["A","B","D"])
How to do it?
Solution 1:
You could check if a column is either non-numeric or contains the min.
This approach is efficient as it first computes the min
per column, then compares each min to the global min.
from pandas.api.types import is_numeric_dtype
# non-numeric
mask = In.apply(is_numeric_dtype)
# contains min
m = (m:=In.min()).eq(m[mask].min()) | ~mask
# If python < 3.8:
# m = In.min()
# m = m.eq(m[mask].min()) | ~mask
Out = In.loc[:,m]
Output:
A B D
0 W 13 45
1 X 23 12
2 Y 12 56
Solution 2:
Find min value amongst min values for each column. Equate outcome to df. Filter any columns with min value. Set A as index before filter happens
In.set_index('A', inplace=True)
In.loc[:,(In==In.min().min()).any()].reset_index()
Or the following if you do not want multiple lines of code
In.set_index('A').loc[:,(In==(In.select_dtypes(exclude='object').min().min())).any()].reset_index()