Pandas: New column with values greater than 0 and operate with these values
Solution 1:
You can use apply
with a function and have to specify axis=1
to apply the function row-wise. I have added a get_diff
function without being 100% if that is exactly what you would need. I have also added an assign
call to create a new dataframe with a new column name X
that holds the needed value
def get_diff(in_:pd.Series) -> int | float:
res = in_[in_ != 0].sort_values(ascending=False)
if len(res) == 0:
return 0 # Not sure if this is what you want to do in that case
return res[-2] - res[-1] if len(res) > 1 else res[0]
df = df.assign(X=lambda df: df.apply(get_diff, axis=1))
Solution 2:
We can do nsmallest
then follow by np.ptp
and condition for those row only have one value not equal to 0
df['new'] = df.apply(lambda x : np.ptp(pd.Series.nsmallest(x[x!=0],2)) if sum(x!=0) != 1 else x[x!=0].iloc[0],axis=1)
Out[520]:
0 7
1 5
2 1
3 1
dtype: int64
Or doing two steps
df['new'] = df[df.ne(0).sum(1)>1].apply(lambda x : np.ptp(pd.Series.nsmallest(x,2)),axis=1)
df['new'].fillna(df.max(1),inplace=True)
df
Out[530]:
A B C D E new
0 1 0 8 0 0 7.0
1 0 0 0 0 5 5.0
2 1 2 3 0 0 1.0
3 0 2 0 1 0 1.0
Solution 3:
I think you can simply use apply()
as you want to do a row operation on each row.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html