Before 1975 writing a compiler was a very time-consuming process. Then Lesk  and Johnson  published papers on lex and yacc. These utilities greatly simplify compiler writing. Implementation details for lex and yacc may be found in Aho . Flex and bison, clones for lex and yacc, can be obtained for free from GNU and Cygwin.
Cygwin is a 32-bit Windows ports of GNU software. In fact Cygwin is a port of the Unix operating system to Windows and comes with compilers gcc and g++. To install simply download and run the setup executable. Under devel install bison, flex, gcc-g++, gdb, and make. Under editors install vim. Lately I've been using flex and bison under the Cygwin environment.
Figure 1: Compilation Sequence
The patterns in the above diagram is a file you create with a text editor. Lex will read your patterns and generate C code for a lexical analyzer or scanner. The lexical analyzer matches strings in the input, based on your patterns, and converts the strings to tokens. Tokens are numerical representations of strings, and simplify processing.
When the lexical analyzer finds identifiers in the input stream it enters them in a symbol table. The symbol table may also contain other information such as data type (integer or real) and location of each variable in memory. All subsequent references to identifiers refer to the appropriate symbol table index.
The grammar in the above diagram is a text file you create with a text editor. Yacc will read your grammar and generate C code for a syntax analyzer or parser. The syntax analyzer uses grammar rules that allow it to analyze tokens from the lexical analyzer and create a syntax tree. The syntax tree imposes a hierarchical structure on the tokens. For example, operator precedence and associativity are apparent in the syntax tree. The next step, code generation, does a depth-first walk of the syntax tree to generate code. Some compilers produce machine code, while others, as shown above, output assembly language.
Figure 2: Building a Compiler with Lex & Yacc
Figure 2 illustrates the file naming conventions used by lex and yacc. We’ll assume our
goal is to write a BASIC compiler. First, we need to specify all pattern matching rules for lex
bas.l) and grammar rules for yacc (
bas.y). Commands to create our
bas.exe, are listed below:
yacc -d bas.y # create y.tab.h, y.tab.c lex bas.l # create lex.yy.c cc lex.yy.c y.tab.c -obas.exe # compile/link
Yacc reads the grammar descriptions in
bas.y and generates a syntax analyzer
(parser), that includes function
yyparse, in file
y.tab.c. Included in
bas.y are token declarations. The
-d option causes yacc to generate
definitions for tokens and place them in file
y.tab.h. Lex reads the pattern
bas.l, includes file
y.tab.h, and generates a lexical
yylex, in file
Finally, the lexer and parser are compiled and linked together to create executable
main we call
yyparse to run the compiler.
yyparse automatically calls
yylex to obtain each token.