grouping to aggregate values, but tripping up on NA's

If we want to use the same code, then coalesce with the 'VALUE' where 'TIME' is 1 (assuming there is a single observation of 'TIME' for each 'ID')

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(consistent = coalesce(VALUE[TIME == 2], VALUE[TIME == 1])) %>% 
  ungroup

-output

# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         1     8          9
2 A         2     9          9
3 B         1    10         10
4 B         2    NA         10
5 C         1    12         13
6 C         2    13         13
7 D         1    14          9
8 D         2     9          9

Or another option is to arrange before doing the group_by and get the first element of 'VALUE' (assuming no replicating for 'TIME')

df %>%
  arrange(ID, is.na(VALUE), desc(TIME)) %>% 
  group_by(ID) %>% 
  mutate(consistent = first(VALUE)) %>%
  ungroup

-output

# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         2     9          9
2 A         1     8          9
3 B         1    10         10
4 B         2    NA         10
5 C         2    13         13
6 C         1    12         13
7 D         2     9          9
8 D         1    14          9

Another possible solution, using tidyr::fill:

library(tidyverse)

df %>%
  group_by(ID) %>%
  mutate(consistent = VALUE) %>% fill(consistent) %>% ungroup  

#> # A tibble: 8 × 4
#>   ID     TIME VALUE consistent
#>   <chr> <dbl> <dbl>      <dbl>
#> 1 A         1     8          8
#> 2 A         2     9          9
#> 3 B         1    10         10
#> 4 B         2    NA         10
#> 5 C         1    12         12
#> 6 C         2    13         13
#> 7 D         1    14         14
#> 8 D         2     9          9

You can also use ifelse with your condition. TIME is guaranteed to be 1 in this scenario if there are only 2 group member each with TIME 1 and 2.

df %>% 
  group_by(ID) %>% 
  arrange(TIME, .by_group=T) %>%
  mutate(consistent=ifelse(is.na(VALUE)&TIME==2, lag(VALUE), VALUE)) %>% 
  ungroup()
# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         1     8          8
2 A         2     9          9
3 B         1    10         10
4 B         2    NA         10
5 C         1    12         12
6 C         2    13         13
7 D         1    14         14
8 D         2     9          9

grouping to aggregate values, but tripping up on NA's

Related

Recent Posts