Conditional rowwise sum of subset of columns in specific row dplyr
If you want to efficiently update a single row (or small subset of rows) I would use direct assignment, not dplyr
.
var_cols = grep(names(df), pattern = "var[0-9]+", value = T)
recalc_id = 2
df[df$ID %in% recalc_id, "var_total"] = apply(df[df$ID %in% recalc_id, var_cols], 1, \(x) sum(x[x > 0]))
As akrun points out in comments, if it is just a single row, the apply
can be skipped:
i = which(df$ID == recalc_id)
row = unlist(df[i, var_cols])
df$var_total[i] = sum(row[row > 0])
Here's the same thing with dplyr::case_when
, for a dplyr
solution:
df = df %>%
rowwise() %>%
mutate(var_total = case_when(
ID %in% 2 ~{
x <- c_across(starts_with('var[0-9]+'))
sum(x[x > 0])
},
TRUE ~ var_total
)
)
(Note that in both cases we need to change the column name pattern to not include var_total
in the sum.)
rowwise
breaks some vectorization and slows things down, so if you are so concerned about efficiency that recalculating the sum is "too slow", I'd strongly recommend the base
solution. You might even find a non-conditional base solution to be plenty fast enough for this row-wise operation.