Pandas: Difference between largest and smallest value within group

Solution 1:

Using @unutbu 's df

per timing
unutbu's solution is best over large data sets

import pandas as pd
import numpy as np

df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})

df.groupby('GROUP')['VALUE'].agg(np.ptp)

GROUP
1     5
2    18
Name: VALUE, dtype: int64

np.ptp docs returns the range of an array


timing
small df

enter image description here

large df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

enter image description here

large df
many groups
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

enter image description here

Solution 2:

groupby/agg generally performs best when you take advantage of the built-in aggregators such as 'max' and 'min'. So to obtain the difference, first compute the max and min and then subtract:

import pandas as pd
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
result = df.groupby('GROUP')['VALUE'].agg(['max','min'])
result['diff'] = result['max']-result['min']
print(result[['diff']])

yields

       diff
GROUP      
1         5
2        18

Solution 3:

Note: this will get the job done, but @piRSquared's answer has faster methods.

You can use groupby(), min(), and max():

df.groupby('GROUP')['VALUE'].apply(lambda g: g.max() - g.min())