Regular expression for a string literal in flex/lex
I'm experimenting to learn flex and would like to match string literals. My code currently looks like:
"\""([^\n\"\\]*(\\[.\n])*)*"\"" {/*matches string-literal*/;}
I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.
I am probably just writing a poor regular expression or one incompatible with flex. Please advise!
A string consists of a quote mark
"
followed by zero or more of either an escaped anything
\\.
or a non-quote character, non-backslash character
[^"\\]
and finally a terminating quote
"
Put it all together, and you've got
\"(\\.|[^"\\])*\"
The delimiting quotes are escaped because they are Flex meta-characters.
For a single line... you can use this:
\"([^\\\"]|\\.)*\" {/*matches string-literal on a single line*/;}
How about using a start state...
int enter_dblquotes = 0; %x DBLQUOTES %% \" { BEGIN(DBLQUOTES); enter_dblquotes++; } <DBLQUOTES>*\" { if (enter_dblquotes){ handle_this_dblquotes(yytext); BEGIN(INITIAL); /* revert back to normal */ enter_dblquotes--; } } ...more rules follow...
It was similar to that effect (flex uses %s
or %x
to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.
Paste my code snippet about handling string in flex, hope inspire your thinking.
Use Start Condition to handle string literal will be more scalable and clear.
%x SINGLE_STRING
%%
\" BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
\n yyerror("the string misses \" to termiate before newline");
<<EOF>> yyerror("the string misses \" to terminate before EOF");
([^\\\"]|\\.)* {/* do your work like save in here */}
\" BEGIN(INITIAL);
. ;
}
This is what we use in Zolang for single line string literals with embedded templates ${...}
\"(\$\{.*\}|\\.|[^\"\\])*\"