Calculate group mean while excluding current observation using dplyr

Using dplyr (preferably), I am trying to calculate the group mean for each observation while excluding that observation from the group.

It seems that this should be doable with a combination of rowwise() and group_by(), but both functions cannot be used simultaneously.

Given this data frame:

df <- data_frame(grouping = rep(LETTERS[1:5], 3),
                 value = 1:15) %>%
  arrange(grouping)
df
#> Source: local data frame [15 x 2]
#> 
#>    grouping value
#>       (chr) (int)
#> 1         A     1
#> 2         A     6
#> 3         A    11
#> 4         B     2
#> 5         B     7
#> 6         B    12
#> 7         C     3
#> 8         C     8
#> 9         C    13
#> 10        D     4
#> 11        D     9
#> 12        D    14
#> 13        E     5
#> 14        E    10
#> 15        E    15

I'd like to get the group mean for each observation with that observation excluded from the group, resulting in:

#>    grouping value special_mean
#>       (chr) (int)
#> 1         A     1          8.5  # i.e. (6 + 11) / 2
#> 2         A     6            6  # i.e. (1 + 11) / 2
#> 3         A    11          3.5  # i.e. (1 + 6) / 2
#> 4         B     2          9.5
#> 5         B     7            7
#> 6         B    12          4.5
#> 7         C     3          ...

I've attempted nesting rowwise() inside a function called by do(), but haven't gotten it to work, along these lines:

special_avg <- function(chunk) {
  chunk %>%
    rowwise() #%>%
    # filter or something...?
}

df %>%
  group_by(grouping) %>%
  do(special_avg(.))

Solution 1:

No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1.

df %>% group_by(grouping) %>%
        mutate(special_mean = (sum(value) - value)/(n()-1))
#   grouping value special_mean
#      (chr) (int)        (dbl)
#1         A     1          8.5
#2         A     6          6.0
#3         A    11          3.5
#4         B     2          9.5
#5         B     7          7.0

Solution 2:

I came across this old question just by chance and I wondered if there is a general solution which would work for other aggregation functions besides mean() as well, e.g., max() as requested by jlesuffleur or median().

The idea is to omit the actual row from computing the aggregate by looping over the rows within the actual group:

library(dplyr)
df %>% 
  group_by(grouping) %>% 
  mutate(special_mean = sapply(1:n(), function(i) mean(value[-i])))
   grouping value special_mean
   <chr>    <int>        <dbl>
 1 A            1          8.5
 2 A            6          6  
 3 A           11          3.5
 4 B            2          9.5
 5 B            7          7  
...  

This will work for max() as well

df %>% 
  group_by(grouping) %>% 
  mutate(special_max = sapply(1:n(), \(i) max(value[-i])))
   grouping value special_max
   <chr>    <int>       <int>
 1 A            1          11
 2 A            6          11
 3 A           11           6
 4 B            2          12
 5 B            7          12
 6 B           12           7
...

For the sake of completeness, here is also a data.table solution:

library(data.table)
setDT(df)[, special_mean := sapply(1:.N, function(i) mean(value[-i])), by = grouping][]