Calculating statistics on subsets of data [duplicate]

Solution 1:

Use the base R function ave(), which despite its confusing name, can calculate a variety of statistics, including the mean:

within(mydata, mean<-ave(measure, subject, FUN=mean))

  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333

Note that I use within just for the sake of shorter code. Here is the equivalent without within():

mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333

Solution 2:

Alternatively with data.table package:

require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]

#   subject time measure mn_measure
# 1:       1    0      10  10.000000
# 2:       1    1      12  10.000000
# 3:       1    2       8  10.000000
# 4:       2    0       7   2.333333
# 5:       2    1       0   2.333333
# 6:       2    2       0   2.333333

Solution 3:

You can use ddply from the plyr package:

library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
  subject time measure mn_measure
1       1    0      10  10.000000
2       1    1      12  10.000000
3       1    2       8  10.000000
4       2    0       7   2.333333
5       2    1       0   2.333333
6       2    2       0   2.333333