Get Block of lines between same patttern using regex [duplicate]

In general, an extraction regex looks like

(?s)pattern.*?(?=pattern|$)

Or, if the pattern is at the start of a line,

(?sm)^pattern.*?(?=\npattern|\Z)

Here, you could use

re.findall(r'chapter [0-9].*?(?=chapter [0-9]|\Z)', text)

See this regex demo. Details:

  • chapter [0-9] - chapter + space and a digit
  • .*? - any zero or more chars, as few as possible
  • (?=chapter [0-9]|\Z) - a positive lookahead that matches a location immediately followed with chapter, space, digit, or end of the whole string.

Here, since the text starts with the keyword, you may use

import re
teststr= 'chapter 1 Here is a block of text from chapter one.  chapter 2 Here is another block of text from the second chapter.  chapter 3 Here is the third and final block of text.'
my_result = [x.strip() for x in re.split(r'(?!^)(?=chapter \d)', teststr)]
print( my_result )
# => ['chapter 1 Here is a block of text from chapter one.', 'chapter 2 Here is another block of text from the second chapter.', 'chapter 3 Here is the third and final block of text.']

See the Python demo. The (?!^)(?=chapter \d) regex means:

  • (?!^) - find a location that is not at the start of string and
  • (?=chapter \d) - is immediately followed with chapter, space and any digit.

The pattern is used to split the string at the found locations, and does not consume any chars, hence, the results are stripped from whitespace in a list comprehension.