Finding the indexes of multiple/overlapping matching substrings
I have a string, s="CCCGTGCC"
and a subtstring ss="CC"
. I want to get all the indexes in s
that start the string ss
. In my example I would want to get back the array c(1,2,6)
.
Is there any string function that achieves this? Notice that my string is in the form "CCCGTGCC"
, and not c("C","C","C","G","T","G","C","C")
.
grep
only returns whether there is a match anywhere in the string, and not the indexes of the matches within the string, unless I'm missing something.
Solution 1:
Try gregexpr
with perl=TRUE
and use perl regular expressions with look-ahead assertions (see ?regex
):
gregexpr("(?=CC)","CCCGTGCC",perl=TRUE)
[[1]]
[1] 1 2 7
attr(,"match.length")
[1] 0 0 0