Python: Random selection per group
Solution 1:
size = 2 # sample size
replace = True # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df.groupby('Group_Id', as_index=False).apply(fn)
Solution 2:
From 0.16.x
onwards pd.DataFrame.sample
provides a way to return a random sample of items from an axis of object.
In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
Out[664]:
Name Group_Id
0 ABC 1
1 XYZ 2
2 DEF 3
Solution 3:
There are two ways to do this very simply, one without using anything except basic pandas syntax:
df[['x','y']].groupby('x').agg(pd.DataFrame.sample)
This takes 14.4ms with 50k row dataset.
The other, slightly faster method, involves numpy.
df[['x','y']].groupby('x').agg(np.random.choice)
This takes 10.9ms with (the same) 50k row dataset.
Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.
Solution 4:
Using groupby and random.choice in an elegant one liner:
df.groupby('Group_Id').apply(lambda x :x.iloc[random.choice(range(0,len(x)))])
Solution 5:
for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)