pandas - get most recent value of a particular column indexed by another column (get maximum value of a particular column indexed by another column)
Solution 1:
If the number of "obj_id"s is very high you'll want to sort the entire dataframe and then drop duplicates to get the last element.
sorted = df.sort_index(by='data_date')
result = sorted.drop_duplicates('obj_id', keep='last').values
This should be faster (sorry I didn't test it) because you don't have to do a custom agg function, which is slow when there is a large number of keys. You might think it's worse to sort the entire dataframe, but in practice in python sorts are fast and native loops are slow.
Solution 2:
This is another possible solution. Dont know if this is the fastest (I doubt..) since I have not benchmarked it against other approaches.
df.loc[df.groupby('obj_id').data_date.idxmax(),:]
Solution 3:
I like crewbum's answer, probably this is faster (sorry, didn't tested this yet, but i avoid sorting everything):
df.groupby('obj_id').agg(lambda df: df.values[df['data_date'].values.argmax()])
it uses numpys "argmax" function to find the rowindex in which the maximum appears.