Pandas to add a column to indicate the 1st and 2nd places, according to row values

A data frame that I want to add a column to indicate, in each row, which "score" are ranked number 1 and number 2. enter image description here

import pandas as pd
from io import StringIO

csvfile = StringIO(
"""Name Department  A_score B_score C_score D_score
Jason   Finance 7   3   7   9
Jason   Sales   2   2   9   2
Molly   Operation   3   7   1   2
""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

# adding columns to indicate the ranks of A,B,C,D
df = df.join(df.rank(axis=1, ascending=False).astype(int).add_suffix('_rank'))

# returning the column headers that in [1, 2]
df_1 = df.apply(lambda x: x.isin([1,2]), axis=1).apply(lambda x: list(df.columns[x]), axis=1)

print (df_1)

# output as:
[A_score_rank, C_score_rank, D_score_rank]
[A_score, B_score, D_score, C_score_rank]
[C_score, D_score, A_score_rank, B_score_rank]

There are two problems

when checking which are the first and second places, it includes the "score" columns however I only want to run them by the "rank" columns
The df_1 comes as a separate data frame, not a part of the extended original data frame

How can I solve these? Any helps our appreciated. Thank you.

Solution 1:

We can do pd.Series.nlargest, then pull out the Not NaN one by notna and dot the column get the result

s = df.filter(like='score').apply(pd.Series.nlargest,n=2,keep='all',axis=1)
df['new'] = s.notna().dot(s.columns+',').str[:-1]
df
    Name Department  A_score  ...  C_score  D_score                      new
0  Jason    Finance        7  ...        7        9  A_score,C_score,D_score
1  Jason      Sales        3  ...        9        2          A_score,C_score
2  Molly  Operation        3  ...        1        2          A_score,B_score
[3 rows x 7 columns]

Pandas to add a column to indicate the 1st and 2nd places, according to row values

Solution 1:

Related

Recent Posts