Practical difference between parser rules and lexer rules in ANTLR?

Solution 1:

... what are the practical differences between these two statements in ANTLR ...

MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language.

my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer.

That's the difference.

Do they result in different AST trees? Different performance? ...

The parser builds the AST using tokens produced by the lexer, so the questions make no sense (to me). A lexer merely "feeds" the parser a 1 dimensional stream of tokens.

Solution 2:

This post may be helpful:

The lexer is responsible for the first step, and it's only job is to create a "token stream" from text. It is not responsible for understanding the semantics of your language, it is only interested in understanding the syntax of your language.

For example, syntax is the rule that an identifier must only use characters, numbers and underscores - as long as it doesn't start with a number. The responsibility of the lexer is to understand this rule. In this case, the lexer would accept the sequence of characters "asd_123" but reject the characters "12dsadsa" (assuming that there isn't another rule in which this text is valid). When seeing the valid text example, it may emit a token into the token stream such as IDENTIFIER(asd_123).

Note that I said "identifier" which is the general term for things like variable names, function names, namespace names, etc. The parser would be the thing that would understand the context in which that identifier appears, so that it would then further specify that token as being a certain thing's name.

(sidenote: the token is just a unique name given to an element of the token stream. The lexeme is the text that the token was matched from. I write the lexeme in parentheses next to the token. For example, NUMBER(123). In this case, this is a NUMBER token with a lexeme of '123'. However, with some tokens, such as operators, I omit the lexeme since it's redundant. For example, I would write SEMICOLON for the semicolon token, not SEMICOLON( ; )).

From ANTLR - When to use Parser Rules vs Lexer Rules?

Practical difference between parser rules and lexer rules in ANTLR?

Solution 1:

Solution 2:

Related

Recent Posts