how to calculate mean/median per group in a dataframe in r [duplicate]
Solution 1:
library(dplyr)
dat%>%
group_by(custid)%>%
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))
# custid Mean Max Min Median Std
#1 1 2.666667 5 1 2.5 1.632993
#2 2 5.500000 10 1 5.5 6.363961
#3 3 2.666667 5 1 2.0 2.081666
For bigger datasets, data.table
would be faster
setDT(dat)[,list(Mean=mean(value), Max=max(value), Min=min(value), Median=as.numeric(median(value)), Std=sd(value)), by=custid]
# custid Mean Max Min Median Std
#1: 1 2.666667 5 1 2.5 1.632993
#2: 2 5.500000 10 1 5.5 6.363961
#3: 3 2.666667 5 1 2.0 2.081666
Solution 2:
To add to the alternatives, here's summaryBy
from the "doBy" package, with which you can specify a list
of functions to apply.
library(doBy)
summaryBy(value ~ custid, data = mydf,
FUN = list(mean, max, min, median, sd))
# custid value.mean value.max value.min value.median value.sd
# 1 1 2.666667 5 1 2.5 1.632993
# 2 2 5.500000 10 1 5.5 6.363961
# 3 3 2.666667 5 1 2.0 2.081666
Of course, you can also stick with base R:
myFun <- function(x) {
c(min = min(x), max = max(x),
mean = mean(x), median = median(x),
std = sd(x))
}
tapply(mydf$value, mydf$custid, myFun)
# $`1`
# min max mean median std
# 1.000000 5.000000 2.666667 2.500000 1.632993
#
# $`2`
# min max mean median std
# 1.000000 10.000000 5.500000 5.500000 6.363961
#
# $`3`
# min max mean median std
# 1.000000 5.000000 2.666667 2.000000 2.081666
cbind(custid = unique(mydf$custid),
do.call(rbind, tapply(mydf$value, mydf$custid, myFun)))
# custid min max mean median std
# 1 1 1 5 2.666667 2.5 1.632993
# 2 2 1 10 5.500000 5.5 6.363961
# 3 3 1 5 2.666667 2.0 2.081666
Solution 3:
If you want to apply a larger number of functions to all or the same column(s) with dplyr
I recommend summarise_each
or mutate_each
:
require(dplyr)
dat %>%
group_by(custid) %>%
summarise_each(funs(max, min, mean, median, sd), value)
#Source: local data frame [3 x 6]
#
# custid max min mean median sd
#1 1 5 1 2.666667 2.5 1.632993
#2 2 10 1 5.500000 5.5 6.363961
#3 3 5 1 2.666667 2.0 2.081666
Or another option with base R's aggregate
:
aggregate(value ~ custid, data = dat, summary)
# custid value.Min. value.1st Qu. value.Median value.Mean value.3rd Qu. value.Max.
#1 1 1.000 1.250 2.500 2.667 3.750 5.000
#2 2 1.000 3.250 5.500 5.500 7.750 10.000
#3 3 1.000 1.500 2.000 2.667 3.500 5.000
(This doesn't include standard deviation but I think it's a nice approach for the other descriptive stats.)