.NET Regex not stopping at the first match [duplicate]

When your left- and right-hand delimiters are single characters, it can be easily solved with negated character classes. So, if your match is between a and c and should not contain b (literally), you may use (demo)

a[^abc]*c

This is the same technique you use when you want to make sure there is a b in between the closest a and c (demo):

a[^abc]*b[^ac]*c

When your left- and right-hand delimiters are multi-character strings, you need a tempered greedy token:

abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz

See the regex demo

To make sure it matches across lines, use re.DOTALL flag when compiling the regex.

Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.

Pattern details:

  • abc - match abc
  • (?:(?!abc|xyz|123).)* - match any character that is not the starting point for a abc, xyz or 123 character sequences
  • 123 - a literal string 123
  • (?:(?!abc|xyz).)* - any character that is not the starting point for a abc or xyz character sequences
  • xyz - a trailing substring xyz

See the diagram below (if re.S is used, . will mean AnyChar):

enter image description here

See the Python demo:

import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyz\nabc abc 123 xyz\nabc text 123 xyz\nabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']

Using PCRE a solution would be:

This using m flag. If you want to check only from start and end of a line add ^ and $ at beginning and end respectively

abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz

Regular expression visualization

Debuggex Demo