Pandas to add a column to indicate the 1st and 2nd places, according to row values
A data frame that I want to add a column to indicate, in each row, which "score" are ranked number 1 and number 2.
import pandas as pd
from io import StringIO
csvfile = StringIO(
"""Name Department A_score B_score C_score D_score
Jason Finance 7 3 7 9
Jason Sales 2 2 9 2
Molly Operation 3 7 1 2
""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
# adding columns to indicate the ranks of A,B,C,D
df = df.join(df.rank(axis=1, ascending=False).astype(int).add_suffix('_rank'))
# returning the column headers that in [1, 2]
df_1 = df.apply(lambda x: x.isin([1,2]), axis=1).apply(lambda x: list(df.columns[x]), axis=1)
print (df_1)
# output as:
[A_score_rank, C_score_rank, D_score_rank]
[A_score, B_score, D_score, C_score_rank]
[C_score, D_score, A_score_rank, B_score_rank]
There are two problems
- when checking which are the first and second places, it includes the "score" columns however I only want to run them by the "rank" columns
- The df_1 comes as a separate data frame, not a part of the extended original data frame
How can I solve these? Any helps our appreciated. Thank you.
Solution 1:
We can do pd.Series.nlargest
, then pull out the Not NaN
one by notna
and dot
the column get the result
s = df.filter(like='score').apply(pd.Series.nlargest,n=2,keep='all',axis=1)
df['new'] = s.notna().dot(s.columns+',').str[:-1]
df
Name Department A_score ... C_score D_score new
0 Jason Finance 7 ... 7 9 A_score,C_score,D_score
1 Jason Sales 3 ... 9 2 A_score,C_score
2 Molly Operation 3 ... 1 2 A_score,B_score
[3 rows x 7 columns]