Mutate subset average and add to all groups R
I'm trying to create mean values for a certain subset of my data, and I want to add these as a new variable to a different subset of the data.
Say I have the following data:
df <- data.frame(year = rep(c(2010,2012,2017), 6),
party = rep(c("A", "A", "A", "B", "B", "B", "C", "C", "C"), 2),
left = rep(c(1,1,1,0,0,0,1,1,1),2),
area = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2),
pp = c(3,4,5,1,2,6,3,4,2,5,1,1,3,4,5,2,3,0))
Which looks like:
year party left area pp
2010 A 1 1 3
2012 A 1 1 4
2017 A 1 1 5
2010 B 0 1 1
2012 B 0 1 2
2017 B 0 1 6
2010 C 1 1 3
2012 C 1 1 4
2017 C 1 1 2
2010 A 1 2 5
2012 A 1 2 1
2017 A 1 2 1
2010 B 0 2 3
2012 B 0 2 4
2017 B 0 2 5
2010 C 1 2 2
2012 C 1 2 3
2017 C 1 2 0
Now what I want is to create a new variable within each "year" and "area" for left == 1, that is the average of "pp". But I also want to this to the subset where left == 0.
I tried:
df %>%
group_by(year, left, area) %>%
mutate(mean_left = mean(pp)) %>% ungroup()
But this gives me a value specific for left == 0 and a specific value for left == 1 (within each year,party).
year party left area pp mean_left
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
2010 A 1 1 3 3
2012 A 1 1 4 4
2017 A 1 1 5 3.5
2010 B 0 1 1 1
2012 B 0 1 2 2
2017 B 0 1 6 6
2010 C 1 1 3 3
2012 C 1 1 4 4
2017 C 1 1 2 3.5
2010 A 1 2 5 3.5
2012 A 1 2 1 2
2017 A 1 2 1 0.5
2010 B 0 2 3 3
2012 B 0 2 4 4
2017 B 0 2 5 5
2010 C 1 2 2 3.5
2012 C 1 2 3 2
2017 C 1 2 0 0.5
I also tried:
df %>%
group_by(year, left, area) %>%
mutate(mean_left = mean(df$pp[df$left==1])) %>% ungroup()
But here the group_by function seems to fail and I get 2.75 in all columns (which is the value of mean(df$pp[df$left==1])
)
How can I get a mean score per year & area for left == 1, adding this same number to this year & area for left == 2?
It should look like:
year party left area pp mean_left
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
2010 A 1 1 3 3
2012 A 1 1 4 4
2017 A 1 1 5 3.5
2010 B 0 1 1 3
2012 B 0 1 2 4
2017 B 0 1 6 3.5
2010 C 1 1 3 3
2012 C 1 1 4 4
2017 C 1 1 2 3.5
2010 A 1 2 5 3.5
2012 A 1 2 1 2
2017 A 1 2 1 0.5
2010 B 0 2 3 3.5
2012 B 0 2 4 2
2017 B 0 2 5 0.5
2010 C 1 2 2 3.5
2012 C 1 2 3 2
2017 C 1 2 0 0.5
Thanks in advance!
You were on the right track; you must not group_by left
but select it within your mean function.
df %>%
group_by(year, area) %>%
mutate(mean_left = mean(pp[left == 1])) %>%
ungroup()
output
# A tibble: 18 x 6
year party left area pp mean_left
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 2010 A 1 1 3 3
2 2012 A 1 1 4 4
3 2017 A 1 1 5 3.5
4 2010 B 0 1 1 3
5 2012 B 0 1 2 4
6 2017 B 0 1 6 3.5
7 2010 C 1 1 3 3
8 2012 C 1 1 4 4
9 2017 C 1 1 2 3.5
10 2010 A 1 2 5 3.5
11 2012 A 1 2 1 2
12 2017 A 1 2 1 0.5
13 2010 B 0 2 3 3.5
14 2012 B 0 2 4 2
15 2017 B 0 2 5 0.5
16 2010 C 1 2 2 3.5
17 2012 C 1 2 3 2
18 2017 C 1 2 0 0.5