pandas - get most recent value of a particular column indexed by another column (get maximum value of a particular column indexed by another column)

Solution 1:

If the number of "obj_id"s is very high you'll want to sort the entire dataframe and then drop duplicates to get the last element.

sorted = df.sort_index(by='data_date')
result = sorted.drop_duplicates('obj_id', keep='last').values

This should be faster (sorry I didn't test it) because you don't have to do a custom agg function, which is slow when there is a large number of keys. You might think it's worse to sort the entire dataframe, but in practice in python sorts are fast and native loops are slow.

Solution 2:

This is another possible solution. Dont know if this is the fastest (I doubt..) since I have not benchmarked it against other approaches.

df.loc[df.groupby('obj_id').data_date.idxmax(),:]

Solution 3:

I like crewbum's answer, probably this is faster (sorry, didn't tested this yet, but i avoid sorting everything):

df.groupby('obj_id').agg(lambda df: df.values[df['data_date'].values.argmax()])

it uses numpys "argmax" function to find the rowindex in which the maximum appears.