Antlr 4.5 parser error during runtime

I'm building simple grammar for programming laguange for learning purposes.

I run into strange error that make no sense to me.

line 1:0 missing {'void', 'int', 'bool', 'string', 'union'} at 'void'

I'm using prebuild lexer and parser from this grammar:

grammar ProgrammingLanguage;

function_definition
    : type_specifier IDENTIFIER '(' parameter_list_opt ')' compound_statement
    ;

type_specifier
    : VOID
    | INT
    | BOOL
    | STRING
    | UNION
    ;

compound_statement
    : '{' declaration_list statement_list '}'
    ;

statement_list
    : statement
    | statement statement_list
    |
    ;

statement
    : compound_statement
    | selection_statement
    | while_statement
    | jump_statement
    | expression_statement
    | comment_statement
    ;

comment_statement
    : COMMENT_START COMMENT
    ;

selection_statement
    : IF '(' expression ')' compound_statement
    | IF '(' expression ')' compound_statement ELSE compound_statement
    | SWITCH '(' expression ')' compound_statement
    ;

expression_statement
    : ';'
    | expression ';'
    ;

jump_statement
    : BREAK ';'
    | CONTINUE ';'
    ;

while_statement
    : WHILE '(' expression ')' compound_statement
    ;

primary_expression
    : IDENTIFIER
    | CONSTANT
    | '(' expression ')'
    | IDENTIFIER '(' primary_expression_list ')'
    ;

primary_expression_list
    : primary_expression
    | primary_expression primary_expression_list
    |
    ;

expression
    : logical_or_expression
    | additive_expression
    ;

logical_or_expression
    : logical_and_expression
    | logical_or_expression '||' logical_and_expression
    ;

logical_and_expression
    : compare_expression
    | logical_and_expression '&&' compare_expression
    ;

compare_expression
    : primary_expression compare_op primary_expression
    | primary_expression
    ;

compare_op
    : '<'
    | '>'
    | '=='
    | '!='
    | '<='
    | '>='
    ;

additive_expression
    : multiplicative_expression
    | additive_expression '+' multiplicative_expression
    | additive_expression '-' multiplicative_expression
    ;

multiplicative_expression
    : primary_expression
    | multiplicative_expression '*' primary_expression
    | multiplicative_expression '/' primary_expression
    | multiplicative_expression '%' primary_expression
    ;

assignment_expression
    : IDENTIFIER '=' expression
    ;

id_list
    : IDENTIFIER
    | IDENTIFIER ',' id_list
    ;

declaration
    : type_specifier id_list ';'
    ;

parameter_list_opt
    : parameter_list
    |
    ;

parameter_list
    : type_specifier IDENTIFIER
    | type_specifier IDENTIFIER ',' parameter_list
    ;

declaration_list
    : declaration
    | declaration declaration_list
    |
    ;

/**------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------
 */
WHILE   : 'while' ;

BREAK   : 'break' ;
CONTINUE    : 'continue' ;
SWITCH  : 'switch' ;

IF  : 'if' ;
ELSE    : 'else' ;

COMMENT_START   : '//' ;

IDENTIFIER  :   ('a'..'z'|'A'..'Z')('0'..'9'|'a'..'z'|'A'..'Z')*;
CONSTANT    :   FALSE|TRUE|STRING_VALUE|INT_VALUE;
STRING_VALUE : '"'COMMENT'"';
COMMENT : ('0'..'9'|'a'..'z'|'A'..'Z')*;
INT_VALUE : ('0'..'9')+;
FALSE : 'false';
TRUE : 'true';

VOID : 'void';
INT : 'int';
BOOL : 'bool';
STRING : 'string';
UNION : 'union';

WS :    (' '|'\t'|'\n'|'\r')+ -> skip;

And I'm parsing with this java code:

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        ProgrammingLanguageLexer lexer = new ProgrammingLanguageLexer(new ANTLRFileStream("input.txt"));
        ProgrammingLanguageParser parser = new ProgrammingLanguageParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.function_definition();
        ParseTreeWalker.DEFAULT.walk(new ProgrammingLanguageBaseListener(), tree);
    }
}

And finally string, that I'm trying to parse:

void power () {}

Solution 1:

The error message means that the expected token type containing the value 'void' does not match the actual token type produced by consuming the string 'void' from the input. Looking at your lexer rules suggests that the input string 'void' is being consumed by the IDENTIFIER rule, producing a token of type IDENTIFIER, not VOID.

In general, the lexer rule that matches longest input string wins. For two (or more) rules with the same match length, the first listed wins. Move all of your keyword rules above the IDENTIFIER rule.

A helpful unit test form will dump the lex'd tokens and show the actual token types matched. Something like:

CommonTokenStream tokens = ...
tokens.fill();
StringBuilder sb = new StringBuilder();
for (Token token : tokens.getTokens()) {
    sb.append(((YourCustomTokenType) token).toString());
}
System.out.print(sb.toString());

The Token.toString() method is usually good enough. Override in your token subclass to fit your own needs.