Lex Strings |
Quoted strings frequently appear in programming languages. Here is one way to match a string in lex:
%{ char *yylval; #include <string.h> %} %% \"[^"\n]*["\n] { yylval = strdup(yytext+1); if (yylval[yyleng-2] != '"') warning("improperly terminated string"); else yylval[yyleng-2] = 0; printf("found '%s'\n", yylval); }
The above example ensures that strings don't cross line boundaries and removes enclosing
quotes. If we wish to add escape sequences, such as "\n
", start states simplify
matters:
%{ char buf[100]; char *s; %} %x STRING %% \" { BEGIN STRING; s = buf; } <STRING>\\n { *s++ = '\n'; } <STRING>\\t { *s++ = '\t'; } <STRING>\\\" { *s++ = '\"'; } <STRING>\" { *s = 0; BEGIN 0; printf("found '%s'\n", buf); } <STRING>\n { printf("invalid string"); exit(1); } <STRING>. { *s++ = *yytext; }
Exclusive start state STRING
is defined in the definition section. When the scanner
detects a quote the BEGIN
macro shifts lex into the STRING
state. Lex
stays in the STRING
state and recognizes only patterns that begin with <STRING>
until another BEGIN
is executed. Thus we have a
mini-environment for scanning strings. When the trailing quote is recognized we switch back to
initial state 0.