Using Perl look-ahead assertion to find individual list

With possible comments (/* ... */) that need be omitted:

perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n+ | (?:/\*.*?\*/) | \z)}gsx' bnf.txt

This captures a line with ::= and all that follows it up to: more newlines, or /*...*/ comment, or end-of-string.

The modifier /s makes . match newlines as well, what it normally doesn't, so that .*? can match multiline text. With /x literal spaces are ignored and can be used for readability.

Or, first remove comments and then split the input string by more-than-one newlines

perl -0777 -wnE's{ (?: /\* .*? \*/ ) }{\n}gsx; say for split /\n\n+/;' bnf.txt

I don't see a need for lookaheads.


The original version of this post used a paragraph mode, via -00, or a regex that splits the whole input by multiple newlines.

That was exceedingly simple and clean -- with the input from the original version of the question, that is, which had no comments. The comments that were then added may have empty lines and reading in paragraphs doesn't fly anymore since spurious ones would be introduced.

I'm restoring it below since it's been deemed useful --

If there's always an empty line separating chunks of interest then can process in paragraphs

perl -00 -wne'print' file

This retains the empty line, which you appear to want to keep anyway. If not, it can be removed.

(Then curiously can evan do simply perl -00 -pe'1' file)

Otherwise, can break that string on more-than-one newline

perl -0777 -wnE'@chunks = split /\n\n+/; say for @chunks' file

or, if you indeed need to just output them

perl -0777 -wnE'say for split /\n\n+/' file

Empty lines between chunks are now removed.

I don't see a reason to go for a lookahead.


I realize that a "BNF definition" may be the line(s) after the one with ::=. In that case, one way

perl -0777 -wnE'say for /(.+?::=.*?)\n(?:\n+|\z)/gs' file

However, with possible comments (/* ... */) that need be omitted:

perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n+ | (?:/\*.*?\*/) | \z)}gsx' bnf.txt

 


A reminder: all revisions to posts can be seen via the link which is right under a post, with the text of the last-edit timestamp.