Calculating statistics on subsets of data [duplicate]
Solution 1:
Use the base R function ave()
, which despite its confusing name, can calculate a variety of statistics, including the mean
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Note that I use within
just for the sake of shorter code. Here is the equivalent without within()
:
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Solution 2:
Alternatively with data.table
package:
require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]
# subject time measure mn_measure
# 1: 1 0 10 10.000000
# 2: 1 1 12 10.000000
# 3: 1 2 8 10.000000
# 4: 2 0 7 2.333333
# 5: 2 1 0 2.333333
# 6: 2 2 0 2.333333
Solution 3:
You can use ddply
from the plyr
package:
library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
subject time measure mn_measure
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333