R data.table binary OR across specific columns
Maybe you are willing to try (data.table
version 1.14.3
for row-wise operation)
for (i in c('spr_1','spr_15','spr_99')) {
DT[,(i) := max(.SD), .SDcols=patterns(paste0(i,"\\.")),by = .I]
}
For stable data.table
version,
for (i in c('spr_1','spr_15','spr_99')) {
DT[,(i) := Reduce(pmax,.SD),.SDcols=patterns(paste0(i,"\\."))]
}
P.S: It's better to use do.call
to replace Reduce
here. see @r2evans 's answer.
Formalizing my comments above into a benchmark. This is not a new answer (other than recommending do.call
over Reduce
, but that's not very novel), just an extended comment. (I'll delete it if it is too far out of context.)
DTbig <- rbindlist(replicate(1e7, DT, simplify=FALSE))
nrow(DTbig)
# [1] 40000000
Proof of the do.call(pmax, .SD)
concept:
DTbig2 <- copy(DTbig)
for (i in c('spr_1','spr_15','spr_99')) {
DTbig2[,(i) := Reduce(pmax, .SD), .SDcols = patterns(paste0(i, "\\."))]
}
for (i in c('spr_1','spr_15','spr_99')) {
DTbig[,(i) := do.call(pmax, .SD), .SDcols = patterns(paste0(i, "\\."))]
}
identical(DTbig, DTbig2)
# [1] TRUE
Benchmark:
bench::mark(
orig = {
for (i in c('spr_1','spr_15','spr_99')) {
DTbig[,(i) := +Reduce(`|`, .SD), .SDcols = patterns(paste0(i, "\\."))]
}
},
Reduce = {
for (i in c('spr_1','spr_15','spr_99')) {
DTbig[,(i) := Reduce(pmax,.SD), .SDcols=patterns(paste0(i,"\\."))]
}
},
docall = {
for (i in c('spr_1','spr_15','spr_99')) {
DTbig[,(i) := do.call(pmax, .SD), .SDcols = patterns(paste0(i, "\\."))]
}
}, iterations = 20)
# # A tibble: 3 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 orig 956.22ms 1.08s 0.939 2.68GB 2.35 6 15 6.39s <NULL> <Rprofmem [109 x 3]> <bench_tm~ <tibble ~
# 2 Reduce 1.04s 1.07s 0.943 2.68GB 2.83 5 15 5.3s <NULL> <Rprofmem [306 x 3]> <bench_tm~ <tibble ~
# 3 docall 847.69ms 863.73ms 1.15 1.79GB 1.15 10 10 8.72s <NULL> <Rprofmem [97 x 3]> <bench_tm~ <tibble ~
(I'm really not sure how iterations=
is interpreted ...)
FYI, the +
in +Reduce(`|`, .SD)
is analogous to your follow-up as.numeric
without the extra step.