Fast comparison of strings within data.table
What about using stringdist
?
library(stringdist)
DT[
,
c(.SD,
setNames(
combn(.SD, 2, function(v) stringdist(v[[1]], v[[2]], method = "hamming"), simplify = FALSE),
paste0("diff_", combn(names(.SD), 2, function(nms) do.call(paste0, list(nms, collapse = "_"))))
))
]
which gives
V1 V2 V3 V4 V5 diff_V1_V2 diff_V1_V3 diff_V1_V4
1: M01000 M01101 M01100 M11100 M01110 2 1 2
2: M01000 M01110 M01110 M01101 M11100 2 2 2
3: M01100 M01000 M00100 M01100 M01110 1 1 0
4: M01000 M01000 M11100 M01101 M01010 0 2 2
diff_V1_V5 diff_V2_V3 diff_V2_V4 diff_V2_V5 diff_V3_V4 diff_V3_V5 diff_V4_V5
1: 2 1 2 2 1 1 2
2: 2 0 2 2 2 2 2
3: 1 2 1 2 1 2 1
4: 1 2 2 1 2 3 3