Matching multiple patterns
I want to see, if "001"
or "100"
or "000"
occurs in a string of 4 characters of 0
and 1
. For example, a 4 character string could be like "1100"
or "0010"
or "1001"
or "1111"
. How do I match many strings in a string with a single command?
I know grep could be used for pattern matching, but using grep, I can check only one string at a time. I want to know if multiple strings can be used with some other command or with grep itself.
Solution 1:
Yes, you can. The |
in a grep
pattern has the same meaning as or
. So you can test for your pattern by using "001|100|000"
as your pattern. At the same time, grep
is vectorised, so all of this can be done in one step:
x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"
grep(pattern, x)
[1] 1 2 3
This returns an index of which of your vectors contained the matching pattern (in this case the first three.)
Sometimes it is more convenient to have a logical vector that tells you which of the elements in your vector were matched. Then you can use grepl
:
grepl(pattern, x)
[1] TRUE TRUE TRUE FALSE
See ?regex
for help about regular expressions in R.
Edit:
To avoid creating pattern manually we can use paste
:
myValues <- c("001", "100", "000")
pattern <- paste(myValues, collapse = "|")
Solution 2:
Here is one solution using stringr
package
require(stringr)
mylist = c("1100", "0010", "1001", "1111")
str_locate(mylist, "000|001|100")
Solution 3:
Use the -e argument to add additional patterns:
echo '1100' | grep -e '001' -e '110' -e '101'
Solution 4:
If you want logical vector then you should check stri_detect
function from stringi
package. In your case the pattern is regex, so use this one:
stri_detect_regex(x, pattern)
## [1] TRUE TRUE TRUE FALSE
And some benchmarks:
require(microbenchmark)
test <- stri_paste(stri_rand_strings(100000, 4, "[0-1]"))
head(test)
## [1] "0001" "1111" "1101" "1101" "1110" "0110"
microbenchmark(stri_detect_regex(test, pattern), grepl(pattern, test))
Unit: milliseconds
expr min lq mean median uq max neval
stri_detect_regex(test, pattern) 29.67405 30.30656 31.61175 30.93748 33.14948 35.90658 100
grepl(pattern, test) 36.72723 37.71329 40.08595 40.01104 41.57586 48.63421 100