Python Pandas max value in a group as a new column
I am trying to calculate a new column which contains maximum values for each of several groups. I'm coming from a Stata background so I know the Stata code would be something like this:
by group, sort: egen max = max(odds)
For example:
data = {'group' : ['A', 'A', 'B','B'],
'odds' : [85, 75, 60, 65]}
Then I would like it to look like:
group odds max
A 85 85
A 75 85
B 60 65
B 65 65
Eventually I am trying to form a column that takes 1/(max-min) * odds
where max
and min
are for each group.
Solution 1:
Use groupby
+ transform
:
df['max'] = df.groupby('group')['odds'].transform('max')
This is equivalent to the verbose:
maxima = df.groupby('group')['odds'].max()
df['max'] = df['group'].map(maxima)
The transform
method aligns the groupby
result to the groupby
indexer, so no explicit mapping is required.
Solution 2:
Using the approach from jpp above works, but it also gives a "SettingWithCopyWarning". While this may not be an issue, I believe the code below would remove that warning:
df = df.assign(max = df.groupby('group')['odds'].transform('max')).values