Pandas: New column with values greater than 0 and operate with these values

Solution 1:

You can use apply with a function and have to specify axis=1 to apply the function row-wise. I have added a get_diff function without being 100% if that is exactly what you would need. I have also added an assign call to create a new dataframe with a new column name Xthat holds the needed value

def get_diff(in_:pd.Series) -> int | float:
    res = in_[in_ != 0].sort_values(ascending=False)
    if len(res) == 0:
        return 0 # Not sure if this is what you want to do in that case
    return res[-2] - res[-1] if len(res) > 1 else res[0]

df = df.assign(X=lambda df: df.apply(get_diff, axis=1))

Solution 2:

We can do nsmallest then follow by np.ptp and condition for those row only have one value not equal to 0

df['new'] = df.apply(lambda x :  np.ptp(pd.Series.nsmallest(x[x!=0],2)) if sum(x!=0) != 1 else x[x!=0].iloc[0],axis=1)
Out[520]: 
0    7
1    5
2    1
3    1
dtype: int64

Or doing two steps

df['new'] = df[df.ne(0).sum(1)>1].apply(lambda x :  np.ptp(pd.Series.nsmallest(x,2)),axis=1)
df['new'].fillna(df.max(1),inplace=True)
df
Out[530]: 
   A  B  C  D  E  new
0  1  0  8  0  0  7.0
1  0  0  0  0  5  5.0
2  1  2  3  0  0  1.0
3  0  2  0  1  0  1.0

Solution 3:

I think you can simply use apply() as you want to do a row operation on each row.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html