find multiple strings using str_extract_all
I have a list of strings as follows:
tofind<-c("aaa","bbb","ccc","ddd")
I also have a vector as follows:
n<-c("aaabbb","aaa","aaacccddd","eee")
I want to find all matches of my tofind
string so that the output should be:
aaa,bbb
aaa
aaa,ccc,ddd
I think I can use str_extract_all
but it doesn't give my the expected output
library(stringr)
sapply(n, function(x) str_extract_all(n,tofind)
How do I get the expected output?
Solution 1:
You could create a single regex:
tofind <- paste(c("aaa","bbb","ccc","ddd"), collapse="|")
str_extract_all(n, tofind)
[[1]] [1] "aaa" "bbb" [[2]] [1] "aaa" [[3]] [1] "aaa" "ccc" "ddd" [[4]] character(0)
Solution 2:
The str_detect
function can help here
suppressPackageStartupMessages(library(tidyverse))
library(stringr)
tofind <- c("aaa", "bbb", "ccc", "ddd")
n <- c("aaabbb", "aaa", "aaacccddd", "eee")
sapply(n, function(x) tofind[str_detect(x, tofind)], USE.NAMES = FALSE)
#> [[1]]
#> [1] "aaa" "bbb"
#>
#> [[2]]
#> [1] "aaa"
#>
#> [[3]]
#> [1] "aaa" "ccc" "ddd"
#>
#> [[4]]
#> character(0)
# or the tidyverse alternative...
n %>%
map(function(x, y) y[str_detect(x, y)], tofind)
#> [[1]]
#> [1] "aaa" "bbb"
#>
#> [[2]]
#> [1] "aaa"
#>
#> [[3]]
#> [1] "aaa" "ccc" "ddd"
#>
#> [[4]]
#> character(0)