Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
Note: The title of this question has been edited to make it the canonical question for issues when plyr
functions mask their dplyr
counterparts. The rest of the question remains unchanged.
Suppose I have the following data:
dfx <- data.frame(
group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
sex = sample(c("M", "F"), size = 29, replace = TRUE),
age = runif(n = 29, min = 18, max = 54)
)
With the good old plyr
I can create a little table summarizing my data with the following code:
require(plyr)
ddply(dfx, .(group, sex), summarize,
mean = round(mean(age), 2),
sd = round(sd(age), 2))
The output look like this:
group sex mean sd
1 A F 49.68 5.68
2 A M 32.21 6.27
3 B F 31.87 9.80
4 B M 37.54 9.73
5 C F 40.61 15.21
6 C M 36.33 11.33
I'm trying to move my code to dplyr
and the %>%
operator. My code takes DF then group it by group and sex and then summarise it. That is:
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
But my output is:
mean sd
1 35.56 9.92
What am I doing wrong?
The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise
is masking dplyr's function summarise
. When that happens you get this warning:
library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------
Attaching package: ‘plyr’
The following objects are masked from ‘package:dplyr’:
arrange, desc, failwith, id, mutate, summarise, summarize
So in order for your code to work, either detach plyr detach(package:plyr)
or restart R and load plyr first and then dplyr (or load only dplyr):
library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group
group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11
Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:
dfx %>% group_by(group, sex) %>%
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Your code is calling plyr::summarise
instead of dplyr::summarise
due to the order in which you have loaded "plyr" and "dplyr".
Demo:
library(dplyr) ## I'm guessing this is the order you loaded
library(plyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
# mean sd
# 1 36.88 9.76
dfx %>% group_by(group, sex) %>%
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
# Source: local data frame [6 x 4]
# Groups: group
#
# group sex mean sd
# 1 A F 32.17 6.30
# 2 A M 30.98 7.37
# 3 B F 38.20 7.67
# 4 B M 33.12 12.24
# 5 C F 43.91 10.31
# 6 C M 47.53 8.25