Matching multiple patterns

I want to see, if "001" or "100" or "000" occurs in a string of 4 characters of 0 and 1. For example, a 4 character string could be like "1100" or "0010" or "1001" or "1111". How do I match many strings in a string with a single command?

I know grep could be used for pattern matching, but using grep, I can check only one string at a time. I want to know if multiple strings can be used with some other command or with grep itself.


Solution 1:

Yes, you can. The | in a grep pattern has the same meaning as or. So you can test for your pattern by using "001|100|000" as your pattern. At the same time, grep is vectorised, so all of this can be done in one step:

x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"

grep(pattern, x)
[1] 1 2 3

This returns an index of which of your vectors contained the matching pattern (in this case the first three.)

Sometimes it is more convenient to have a logical vector that tells you which of the elements in your vector were matched. Then you can use grepl:

grepl(pattern, x)
[1]  TRUE  TRUE  TRUE FALSE

See ?regex for help about regular expressions in R.


Edit: To avoid creating pattern manually we can use paste:

myValues <- c("001", "100", "000")
pattern <- paste(myValues, collapse = "|")

Solution 2:

Here is one solution using stringr package

require(stringr)
mylist = c("1100", "0010", "1001", "1111")
str_locate(mylist, "000|001|100")

Solution 3:

Use the -e argument to add additional patterns:

echo '1100' | grep -e '001' -e '110' -e '101'

Solution 4:

If you want logical vector then you should check stri_detect function from stringi package. In your case the pattern is regex, so use this one:

stri_detect_regex(x, pattern)
## [1]  TRUE  TRUE  TRUE FALSE

And some benchmarks:

require(microbenchmark)
test <- stri_paste(stri_rand_strings(100000, 4, "[0-1]"))
head(test)
## [1] "0001" "1111" "1101" "1101" "1110" "0110"
microbenchmark(stri_detect_regex(test, pattern), grepl(pattern, test))
Unit: milliseconds
                             expr      min       lq     mean   median       uq      max neval
 stri_detect_regex(test, pattern) 29.67405 30.30656 31.61175 30.93748 33.14948 35.90658   100
             grepl(pattern, test) 36.72723 37.71329 40.08595 40.01104 41.57586 48.63421   100