Python: Random selection per group

Solution 1:

size = 2        # sample size
replace = True  # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df.groupby('Group_Id', as_index=False).apply(fn)

Solution 2:

From 0.16.x onwards pd.DataFrame.sample provides a way to return a random sample of items from an axis of object.

In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
Out[664]:
  Name  Group_Id
0  ABC         1
1  XYZ         2
2  DEF         3

Solution 3:

There are two ways to do this very simply, one without using anything except basic pandas syntax:

df[['x','y']].groupby('x').agg(pd.DataFrame.sample)

This takes 14.4ms with 50k row dataset.

The other, slightly faster method, involves numpy.

df[['x','y']].groupby('x').agg(np.random.choice)

This takes 10.9ms with (the same) 50k row dataset.

Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.

Solution 4:

Using groupby and random.choice in an elegant one liner:

df.groupby('Group_Id').apply(lambda x :x.iloc[random.choice(range(0,len(x)))])

Solution 5:

for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)