ANTLR4 catch an entire line of arbitrary data
I have a grammar with command lines starting with a / and "data lines" which is everything that does not start with a slash.
I just can't get it to be parsed correctly, the following rule
FM_DATA: ( ('\r' | '\n' | '\r\n') ~'/') -> mode(DATA_MODE);
does almost what I need but for a data line of
abcde
the following tokens are generated
[@23,170:171='\na',<4>,4:72]
[@24,172:175='bcde',<103>,5:1]
so the first character is swallowed by the rule.
I also tried
FM_DATA: ( {getCharPositionInLine() == 0}? ~'/') -> mode(DATA_MODE);
but this causes even weirder things.
What's the correct rule for getting this to work as expected ?
TIA - Alex
Solution 1:
The ... -> more
command can be used to let the first char (or first part of a lexer rule) not be consumed (yet).
A quick demo:
lexer grammar FmDataLexer;
NewLine
: [\r\n]+ -> skip
;
CommandStart
: '/' -> pushMode(CommandMode)
;
FmDataStart
: . -> more, pushMode(FmDataMode)
;
mode CommandMode;
CommandLine
: ~[\r\n]+ -> popMode
;
mode FmDataMode;
FmData
: ~[\r\n]+ -> popMode
;
If you run the following code:
FmDataLexer lexer = new FmDataLexer(CharStreams.fromString("abcde\n/mu"));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s '%s'\n", FmDataLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
you'll get this output:
FmData 'abcde'
CommandStart '/'
CommandLine 'mu'
EOF '<EOF>'
See: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#mode-pushmode-popmode-and-more