Finding the indexes of multiple/overlapping matching substrings

I have a string, s="CCCGTGCC" and a subtstring ss="CC". I want to get all the indexes in s that start the string ss. In my example I would want to get back the array c(1,2,6).

Is there any string function that achieves this? Notice that my string is in the form "CCCGTGCC", and not c("C","C","C","G","T","G","C","C").

grep only returns whether there is a match anywhere in the string, and not the indexes of the matches within the string, unless I'm missing something.


Solution 1:

Try gregexpr with perl=TRUE and use perl regular expressions with look-ahead assertions (see ?regex):

gregexpr("(?=CC)","CCCGTGCC",perl=TRUE)
[[1]]
[1] 1 2 7
attr(,"match.length")
[1] 0 0 0