Applying group_by and summarise on data while keeping all the columns' info
Here are two options using a) filter
and b) slice
from dplyr. In this case there are no duplicated minimum values in column c
for any of the groups and so the results of a) and b) are the same. If there were duplicated minima, approach a) would return each minima per group while b) would only return one minimum (the first) in each group.
a)
> data %>% group_by(b) %>% filter(c == min(c))
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med
Or similarly
> data %>% group_by(b) %>% filter(min_rank(c) == 1L)
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med
b)
> data %>% group_by(b) %>% slice(which.min(c))
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med
You can use group_by
without summarize
:
data %>%
group_by(b) %>%
mutate(min_values = min(c)) %>%
ungroup()
Using sqldf
:
library(sqldf)
# Two options:
sqldf('SELECT * FROM data GROUP BY b HAVING min(c)')
sqldf('SELECT a, b, min(c) min, d FROM data GROUP BY b')
Output:
a b c d
1 1 a 1.2 small
2 4 b 1.7 larg
3 6 c 3.1 med
4 10 d 2.2 med