C++ regex for overlapping matches
I have a string 'CCCC' and I want to match 'CCC' in it, with overlap.
My code:
...
std::string input_seq = "CCCC";
std::regex re("CCC");
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\t" << "\t" << match.position() << "\t" << "\n";
next++;
}
...
However this only returns
CCC 0
and skips the CCC 1
solution, which is needed for me.
I read about non-greedy '?' matching, but I could not make it work
Solution 1:
Your regex can be put into the capturing parentheses that can be wrapped with a positive lookahead.
To make it work on Mac, too, make sure the regex matches (and thus consumes) a single char at each match by placing a .
(or - to also match line break chars - [\s\S]
) after the lookahead.
Then, you will need to amend the code to get the first capturing group value like this:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
std::string input_seq = "CCCC";
std::regex re("(?=(CCC))."); // <-- PATTERN MODIFICATION
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\t" << "\t" << match.position() << "\t" << "\n"; // <-- SEE HERE
next++;
}
return 0;
}
See the C++ demo
Output:
CCC 0
CCC 1