dplyr - Group by and select TOP x %

Solution 1:

Or another option with dplyr:

mtcars %>% select(gear, wt) %>% 
  group_by(gear) %>% 
  arrange(gear, desc(wt)) %>% 
  filter(wt > quantile(wt, .8))

Source: local data frame [7 x 2]
Groups: gear [3]

   gear    wt
  (dbl) (dbl)
1     3 5.424
2     3 5.345
3     3 5.250
4     4 3.440
5     4 3.440
6     4 3.190
7     5 3.570

Solution 2:

Here's another way

mtcars %>% 
  select(gear, wt) %>% 
  arrange(gear, desc(wt)) %>% 
  group_by(gear) %>% 
  slice(seq(n()*.2))

   gear    wt
  (dbl) (dbl)
1     3 5.424
2     3 5.345
3     3 5.250
4     4 3.440
5     4 3.440
6     5 3.570

I take "top" to mean "having the highest value for wt" and so used desc().

Solution 3:

I believe this gets to the answer you're looking for.

library(dplyr)

mtcars %>% select(gear, wt) %>% 
  group_by(gear) %>% 
  arrange(gear, wt) %>% 
  filter(row_number() / n() <= .2)

Solution 4:

I know this is coming late, but might help someone now. dplyr has a new function top_frac

  library(dplyr)
mtcars %>%
  select(gear, wt) %>%
  group_by(gear) %>%
  arrange(gear, wt) %>%
  top_frac(n = 0.2,wt = wt)

Here n is the fraction of rows to return and wt is the variable to be used for ordering.

The output is as below.

gear wt 3 5.250 3 5.345 3 5.424
4 3.440 4 3.440 5 3.570