how to generate a new variable by one column's value overriding the other's in R
Here is an option with pmax
and cummax
(assuming the .
are missing -NA
). Grouped by 'group', invoke
pmax
across
the columns that 'starts_with' 'var' in column names, and get the cumulative max (cummax
)
library(dplyr)
library(purrr)
df1 %>%
group_by(group) %>%
mutate(newvar = cummax(invoke(pmax,
c(across(starts_with('var')), na.rm = TRUE)))) %>%
ungroup
-output
# A tibble: 10 × 5
group var1 var2 var3 newvar
<chr> <int> <int> <int> <int>
1 a 1 NA NA 1
2 a 1 NA NA 1
3 a 1 2 NA 2
4 a 1 2 3 3
5 a 1 NA NA 3
6 b 1 NA NA 1
7 b 1 2 3 3
8 b 1 2 NA 3
9 b 1 2 3 3
10 b 1 2 NA 3
data
df1 <- structure(list(group = c("a", "a", "a", "a", "a", "b", "b", "b",
"b", "b"), var1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
var2 = c(NA, NA, 2L, 2L, NA, NA, 2L, 2L, 2L, 2L), var3 = c(NA,
NA, NA, 3L, NA, NA, 3L, NA, 3L, NA)), row.names = c(NA, -10L
), class = "data.frame")