Keep only those columns in dataframe based on min value of each row

I have a datafrme

In = pd.DataFrame([["W",13,23,45,65], ["X",23,45,12,78], ["Y",12,34,56,89]],columns=["A","B","C","D","E"])

W row has min value 13, X row has min value 12, and Y row has min value 12. Keep only those columns that have min value of all the rows.

Expected output:

Out = pd.DataFrame([["W",13,45], ["X",23,12], ["Y",12,56]],columns=["A","B","D"])

How to do it?

Solution 1:

You could check if a column is either non-numeric or contains the min.

This approach is efficient as it first computes the min per column, then compares each min to the global min.

from pandas.api.types import is_numeric_dtype

# non-numeric
mask = In.apply(is_numeric_dtype)
# contains min
m = (m:=In.min()).eq(m[mask].min()) | ~mask
# If python < 3.8:
# m = In.min()
# m = m.eq(m[mask].min()) | ~mask

Out = In.loc[:,m]

Output:

   A   B   D
0  W  13  45
1  X  23  12
2  Y  12  56

Solution 2:

Find min value amongst min values for each column. Equate outcome to df. Filter any columns with min value. Set A as index before filter happens

In.set_index('A', inplace=True)
In.loc[:,(In==In.min().min()).any()].reset_index()

Or the following if you do not want multiple lines of code

In.set_index('A').loc[:,(In==(In.select_dtypes(exclude='object').min().min())).any()].reset_index()

Keep only those columns in dataframe based on min value of each row

Solution 1:

Solution 2:

Related

Recent Posts