Using Perl look-ahead assertion to find individual list
With possible comments (/* ... */
) that need be omitted:
perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n+ | (?:/\*.*?\*/) | \z)}gsx' bnf.txt
This captures a line with ::=
and all that follows it up to: more newlines, or /*...*/
comment, or end-of-string.
The modifier /s
makes .
match newlines as well, what it normally doesn't, so that .*?
can match multiline text. With /x
literal spaces are ignored and can be used for readability.
Or, first remove comments and then split the input string by more-than-one newlines
perl -0777 -wnE's{ (?: /\* .*? \*/ ) }{\n}gsx; say for split /\n\n+/;' bnf.txt
I don't see a need for lookaheads.
The original version of this post used a paragraph mode, via -00
, or a regex that splits the whole input by multiple newlines.
That was exceedingly simple and clean -- with the input from the original version of the question, that is, which had no comments. The comments that were then added may have empty lines and reading in paragraphs doesn't fly anymore since spurious ones would be introduced.
I'm restoring it below since it's been deemed useful --
If there's always an empty line separating chunks of interest then can process in paragraphs
perl -00 -wne'print' file
This retains the empty line, which you appear to want to keep anyway. If not, it can be removed.
(Then curiously can evan do simply perl -00 -pe'1' file
)
Otherwise, can break that string on more-than-one newline
perl -0777 -wnE'@chunks = split /\n\n+/; say for @chunks' file
or, if you indeed need to just output them
perl -0777 -wnE'say for split /\n\n+/' file
Empty lines between chunks are now removed.
I don't see a reason to go for a lookahead.
I realize that a "BNF definition" may be the line(s) after the one with ::=
. In that case, one way
perl -0777 -wnE'say for /(.+?::=.*?)\n(?:\n+|\z)/gs' file
However, with possible comments (/* ... */
) that need be omitted:
perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n+ | (?:/\*.*?\*/) | \z)}gsx' bnf.txt
A reminder: all revisions to posts can be seen via the link which is right under a post, with the text of the last-edit timestamp.