Display weighted mean by group in the data.frame
Issues regarding the command by
and weighted.mean
already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.
I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:
obs income education weight incomegroup
1. 1000 A 10 --> display weighted mean of income for education level A
2. 2000 B 1 --> display weighted mean of income for education level B
3. 1500 B 5 --> display weighted mean of income for education level B
4. 2000 A 2 --> display weighted mean of income for education level A
I tried:
data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))
It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr
or aggregate
but it does not seem to do what I am interested in.
The ave{stats}
command gives exactly what I am looking for but only for simple mean:
data$incomegroup=ave(data$income,data$education,FUN = mean)
It cannot be used with weights.
Thanking you in advance for your help!
If we use mutate
, then we can avoid the left_join
library(dplyr)
df %>%
group_by(education) %>%
mutate(weighted_income = weighted.mean(income, weight))
# obs income education weight weighted_income
# <int> <int> <fctr> <int> <dbl>
#1 1 1000 A 10 1166.667
#2 2 2000 B 1 1583.333
#3 3 1500 B 5 1583.333
#4 4 2000 A 2 1166.667
Try using the dplyr package as follows:
df <- read.table(text = 'obs income education weight
1 1000 A 10
2 2000 B 1
3 1500 B 5
4 2000 A 2',
header = TRUE)
library(dplyr)
df_summary <-
df %>%
group_by(education) %>%
summarise(weighted_income = weighted.mean(income, weight))
df_summary
# education weighted_income
# A 1166.667
# B 1583.333
df_final <- left_join(df, df_summary, by = 'education')
df_final
# obs income education weight weighted_income
# 1 1000 A 10 1166.667
# 2 2000 B 1 1583.333
# 3 1500 B 5 1583.333
# 4 2000 A 2 1166.667