R: save indices of unknown number of matches of two vectors into a third
I'm looking for a way to save the indices of an unknown number of matches of two vectors into a third. The problem occurs for example here:
#create large number of letters:
alphabet_soup<-rep(letters[1:10],times=50)
#sample to mix up letters:
alphabet_soup<-sample(alphabet_soup,size=100)
#vector to match
test_vector<-c("a","b","c")
In other words: which are the indices of the matches for "a","b" and "c" in alphabeth_soup?
As there might be more than 1 match for each "a","b" and "c", the r-functions match()
and %in%
don't work.
As I don't know before the test_vector
or rather it isn't that simple/short as in my example, the following solution isn't practicable as well:
as<-which(alphabet_soup==test_vector[1])
bs<-which(alphabet_soup==test_vector[2])
cs<-which(alphabet_soup==test_vector[3])
matches<-c(as,bs,cs)
there might be a solution by looping, but I failed with my tries so far.
I think doing it in a loop/function is the most controlled approach, but there is also an option with "grep".
first with a loop/function (I like to use functions as these are usually faster and easier to structure the output, but the principle is the same). i have structured the output as a data frame, as to know where what comes from, but this should be easily changed
alphabet_soup<-rep(letters[1:10],times=50)
alphabet_soup<-sample(alphabet_soup,size=100)
test_vector<-c("a","b","c")
fun <- function(i) {
matches <- which(alphabet_soup==test_vector[i])
result <- data.frame(vector = test_vector[i], match = matches)
}
dat<-do.call("rbind", lapply(1:length(test_vector), fun))
second option is with "grep", note that the output from this is automatically sorted alphabetically, and i don't know how to avoid this, on the other hand it is much simpler.
grep("a|b|c", alphabet_soup, value = F)