Conditional rowwise sum of subset of columns in specific row dplyr

If you want to efficiently update a single row (or small subset of rows) I would use direct assignment, not dplyr.

var_cols = grep(names(df), pattern = "var[0-9]+", value = T)
recalc_id = 2
df[df$ID %in% recalc_id, "var_total"] = apply(df[df$ID %in% recalc_id, var_cols], 1, \(x) sum(x[x > 0]))

As akrun points out in comments, if it is just a single row, the apply can be skipped:

i = which(df$ID == recalc_id)
row = unlist(df[i, var_cols])
df$var_total[i] = sum(row[row > 0])

Here's the same thing with dplyr::case_when, for a dplyr solution:

df = df %>%
  rowwise() %>%
  mutate(var_total = case_when(
      ID %in% 2 ~{
        x <- c_across(starts_with('var[0-9]+'))
        sum(x[x > 0])
      },
      TRUE ~ var_total
    )
  )

(Note that in both cases we need to change the column name pattern to not include var_total in the sum.)

rowwise breaks some vectorization and slows things down, so if you are so concerned about efficiency that recalculating the sum is "too slow", I'd strongly recommend the base solution. You might even find a non-conditional base solution to be plenty fast enough for this row-wise operation.

Conditional rowwise sum of subset of columns in specific row dplyr

Related

Recent Posts