grep using a character vector with multiple patterns
I am trying to use grep
to test whether a vector of strings are present in an another vector or not, and to output the values that are present (the matching patterns).
I have a data frame like this:
FirstName Letter
Alex A1
Alex A6
Alex A7
Bob A1
Chris A9
Chris A6
I have a vector of strings patterns to be found in the "Letter" columns, for example: c("A1", "A9", "A6")
.
I would like to check whether the any of the strings in the pattern vector is present in the "Letter" column. If they are, I would like the output of unique values.
The problem is, I don't know how to use grep
with multiple patterns. I tried:
matches <- unique (
grep("A1| A9 | A6", myfile$Letter, value=TRUE, fixed=TRUE)
)
But it gives me 0 matches which is not true, any suggestions?
Solution 1:
In addition to @Marek's comment about not including fixed==TRUE
, you also need to not have the spaces in your regular expression. It should be "A1|A9|A6"
.
You also mention that there are lots of patterns. Assuming that they are in a vector
toMatch <- c("A1", "A9", "A6")
Then you can create your regular expression directly using paste
and collapse = "|"
.
matches <- unique (grep(paste(toMatch,collapse="|"),
myfile$Letter, value=TRUE))
Solution 2:
Good answers, however don't forget about filter()
from dplyr:
patterns <- c("A1", "A9", "A6")
>your_df
FirstName Letter
1 Alex A1
2 Alex A6
3 Alex A7
4 Bob A1
5 Chris A9
6 Chris A6
result <- filter(your_df, grepl(paste(patterns, collapse="|"), Letter))
>result
FirstName Letter
1 Alex A1
2 Alex A6
3 Bob A1
4 Chris A9
5 Chris A6