Calculate group mean while excluding current observation using dplyr
Using dplyr
(preferably), I am trying to calculate the group mean for each observation while excluding that observation from the group.
It seems that this should be doable with a combination of rowwise()
and group_by()
, but both functions cannot be used simultaneously.
Given this data frame:
df <- data_frame(grouping = rep(LETTERS[1:5], 3),
value = 1:15) %>%
arrange(grouping)
df
#> Source: local data frame [15 x 2]
#>
#> grouping value
#> (chr) (int)
#> 1 A 1
#> 2 A 6
#> 3 A 11
#> 4 B 2
#> 5 B 7
#> 6 B 12
#> 7 C 3
#> 8 C 8
#> 9 C 13
#> 10 D 4
#> 11 D 9
#> 12 D 14
#> 13 E 5
#> 14 E 10
#> 15 E 15
I'd like to get the group mean for each observation with that observation excluded from the group, resulting in:
#> grouping value special_mean
#> (chr) (int)
#> 1 A 1 8.5 # i.e. (6 + 11) / 2
#> 2 A 6 6 # i.e. (1 + 11) / 2
#> 3 A 11 3.5 # i.e. (1 + 6) / 2
#> 4 B 2 9.5
#> 5 B 7 7
#> 6 B 12 4.5
#> 7 C 3 ...
I've attempted nesting rowwise()
inside a function called by do()
, but haven't gotten it to work, along these lines:
special_avg <- function(chunk) {
chunk %>%
rowwise() #%>%
# filter or something...?
}
df %>%
group_by(grouping) %>%
do(special_avg(.))
Solution 1:
No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1
.
df %>% group_by(grouping) %>%
mutate(special_mean = (sum(value) - value)/(n()-1))
# grouping value special_mean
# (chr) (int) (dbl)
#1 A 1 8.5
#2 A 6 6.0
#3 A 11 3.5
#4 B 2 9.5
#5 B 7 7.0
Solution 2:
I came across this old question just by chance and I wondered if there is a general solution which would work for other aggregation functions besides mean()
as well, e.g., max()
as requested by jlesuffleur or median()
.
The idea is to omit the actual row from computing the aggregate by looping over the rows within the actual group:
library(dplyr)
df %>%
group_by(grouping) %>%
mutate(special_mean = sapply(1:n(), function(i) mean(value[-i])))
grouping value special_mean <chr> <int> <dbl> 1 A 1 8.5 2 A 6 6 3 A 11 3.5 4 B 2 9.5 5 B 7 7 ...
This will work for max()
as well
df %>%
group_by(grouping) %>%
mutate(special_max = sapply(1:n(), \(i) max(value[-i])))
grouping value special_max <chr> <int> <int> 1 A 1 11 2 A 6 11 3 A 11 6 4 B 2 12 5 B 7 12 6 B 12 7 ...
For the sake of completeness, here is also a data.table solution:
library(data.table)
setDT(df)[, special_mean := sapply(1:.N, function(i) mean(value[-i])), by = grouping][]