Adding a column of means by group to original data [duplicate]
I want to add a column of means based on factor column in R
data.frame
. Like this:
df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
# X Y.Old Y.New
# 1 A 1 2
# 2 A 2 2
# 3 A 3 2
# 4 B 4 5
# 5 B 5 5
# 6 B 6 5
To accomplish this problem I've to create two unnecessary data.frames
. I'd like to know a way to append a column of means by factor column into my original data.frame
without creating any extra data.frames
. Thanks for your time and help.
This is what the ave
function is for.
df1$Y.New <- ave(df1$Y, df1$X)
Two alternative ways of doing this:
1. with the dplyr package:
library(dplyr)
df1 <- df1 %>%
group_by(X) %>%
mutate(Y.new = mean(Y))
2. with the data.table package:
library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]
both give the following result:
> df1 X Y Y.new 1: A 1 2 2: A 2 2 3: A 3 2 4: B 4 5 5: B 5 5 6: B 6 5
ddply
and transform
to the rescue (although I'm sure you'll get at least 4 different ways to do this):
library(plyr)
ddply(df1,.(X),transform,Y.New = mean(Y))
X Y Y.New
1 A 1 2
2 A 2 2
3 A 3 2
4 B 4 5
5 B 5 5
6 B 6 5
Joran answered beautifully, This is not an answer to your question but an extension of the conversation. If you're looking for table of means for two categorical variable's relationship to a dependent here's the Hadley function for that:
cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)
Here's a head view of CO2 data, and a look at the means table:
> head(CO2)
Plant Type Treatment conc uptake
1 Qn1 Quebec nonchilled 95 16.0
2 Qn1 Quebec nonchilled 175 30.4
3 Qn1 Quebec nonchilled 250 34.8
4 Qn1 Quebec nonchilled 350 37.2
5 Qn1 Quebec nonchilled 500 35.3
6 Qn1 Quebec nonchilled 675 39.2
> library(reshape)
> cast(CO2, Type ~ Treatment, mean, margins=TRUE)
Type nonchilled chilled (all)
1 Quebec 35.33333 31.75238 33.54286
2 Mississippi 25.95238 15.81429 20.88333
3 (all) 30.64286 23.78333 27.21310