Parsing a newline-terminated programming language

Recently i've been trying to develop a custom programming language. But where the previous languages I (attempted to) make were semicolon-terminated the language I'm now making is terminated by a newline, just like Python.

The problem I've stumbled across was that while every semicolon in e.g. C++ is treated as a terminator of sorts, a newline in Python does not always act as a terminator.

For example:

// incorrect in c++
myfunc();;;;otherfunc();

and

# completely fine in python
myfunc()



otherfunc()

So my question is, how do i parse this? What does the backus-naur form of a language like this look like?


Solution 1:

I don't know about C++, but in many semicolon-terminated languages, ;; is perfectly valid. Example in PHP

A simple way to express this in the abstract grammar is to allow an empty statement - that is, one made up only of optional whitespace. The parser can then accept this as valid, but emit nothing.

In the PHP parser, one of the productions for statement is this:

';' /* empty statement */ { $$ = NULL; }

The same rule could be used (mutatis mutandis) in a grammar where newline was treated as a significant token, rather than grouped into whitespace.