Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

Note: The title of this question has been edited to make it the canonical question for issues when plyr functions mask their dplyr counterparts. The rest of the question remains unchanged.


Suppose I have the following data:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

With the good old plyr I can create a little table summarizing my data with the following code:

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

The output look like this:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33

I'm trying to move my code to dplyr and the %>% operator. My code takes DF then group it by group and sex and then summarise it. That is:

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

But my output is:

  mean   sd
1 35.56 9.92

What am I doing wrong?


The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

library(plyr)
    Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

  group sex  mean    sd
1     A   F 41.51  8.24
2     A   M 32.23 11.85
3     B   F 38.79 11.93
4     B   M 31.00  7.92
5     C   F 24.97  7.46
6     C   M 36.17  9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

Your code is calling plyr::summarise instead of dplyr::summarise due to the order in which you have loaded "plyr" and "dplyr".

Demo:

library(dplyr) ## I'm guessing this is the order you loaded
library(plyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
#    mean   sd
# 1 36.88 9.76
dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
# Source: local data frame [6 x 4]
# Groups: group
# 
#   group sex  mean    sd
# 1     A   F 32.17  6.30
# 2     A   M 30.98  7.37
# 3     B   F 38.20  7.67
# 4     B   M 33.12 12.24
# 5     C   F 43.91 10.31
# 6     C   M 47.53  8.25