Display weighted mean by group in the data.frame

r weighted-average

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.

I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:

obs income education weight incomegroup
1.   1000      A       10    --> display weighted mean of income for education level A
2.   2000      B        1    --> display weighted mean of income for education level B
3.   1500      B        5    --> display weighted mean of income for education level B
4.   2000      A        2    --> display weighted mean of income for education level A

I tried:

data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))

It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr or aggregate but it does not seem to do what I am interested in.

The ave{stats} command gives exactly what I am looking for but only for simple mean:

data$incomegroup=ave(data$income,data$education,FUN = mean)

It cannot be used with weights.

Thanking you in advance for your help!

If we use mutate, then we can avoid the left_join

library(dplyr)
df %>%
   group_by(education) %>% 
   mutate(weighted_income = weighted.mean(income, weight))
#    obs income education weight weighted_income
#  <int>  <int>    <fctr>  <int>           <dbl>
#1     1   1000         A     10        1166.667
#2     2   2000         B      1        1583.333
#3     3   1500         B      5        1583.333
#4     4   2000         A      2        1166.667

Try using the dplyr package as follows:

df <- read.table(text = 'obs income education weight   
                          1   1000      A       10     
                          2   2000      B        1     
                          3   1500      B        5     
                          4   2000      A        2', 
                 header = TRUE)     

library(dplyr)

df_summary <- 
  df %>% 
  group_by(education) %>% 
  summarise(weighted_income = weighted.mean(income, weight))

df_summary
# education weighted_income
#     A        1166.667
#     B        1583.333

df_final <- left_join(df, df_summary, by = 'education')

df_final
# obs income education weight weighted_income
#  1   1000         A     10        1166.667
#  2   2000         B      1        1583.333
#  3   1500         B      5        1583.333
#  4   2000         A      2        1166.667

Display weighted mean by group in the data.frame

Related

Recent Posts