Calculator Description |
This version of the calculator is substantially more complex than previous versions. Major changes include control constructs such as if-else and while. In addition a syntax tree is constructed during parsing. After parsing we walk the syntax tree to produce output. Three versions of the tree walk routine are supplied:
- an interpreter that executes statements during the tree walk
- a compiler that generates code for a hypothetical stack-based machine
- a version that generates a syntax tree of the original program.
To make things more concrete, here is a sample program,
x = 0; while (x < 3) { print x; x = x + 1; }
with output for an interpretive version,
0 1 2
a compiler version,
push 0 pop x L000: push x push 3 compLT jz L001 push x print push x push 1 add pop x jmp L000 L001:
and a version that generates a syntax tree.
Graph 0: [=] | |----| | | id(X) c(0) Graph 1: while | |----------------| | | [<] [;] | | |----| |----------| | | | | id(X) c(3) print [=] | | | |-------| | | | id(X) id(X) [+] | |----| | | id(X) c(1)
The include file contains declarations for the syntax tree and symbol
table. Symbol table (sym
) allows for single-character variable names. A node in
the syntax tree may hold a constant (conNodeType
), an identifier
(idNodeType
), or an internal node with an operator (oprNodeType
). A union
encapsulates all three variants and nodeType.type
is used to determine which
structure we have.
The lex input file contains patterns for VARIABLE
and INTEGER
tokens. In addition, tokens are defined for 2-character operators such as EQ
and NE
. Single-character operators are simply returned as
themselves.
The yacc input file defines YYSTYPE
, the type of yylval
, as
%union { int iValue; /* integer value */ char sIndex; /* symbol table index */ nodeType *nPtr; /* node pointer */ };
This causes the following to be generated in y.tab.h
:
typedef union { int iValue; /* integer value */ char sIndex; /* symbol table index */ nodeType *nPtr; /* node pointer */ } YYSTYPE; extern YYSTYPE yylval;
Constants, variables, and nodes can be represented by yylval
in the parser’s
value stack. A more accurate representation of decimal integers is given below. This is similar to C/C++ where integers that begin with 0 are classified as octal.
0 { yylval.iValue = atoi(yytext); return INTEGER; } [1-9][0-9]* { yylval.iValue = atoi(yytext); return INTEGER; }
Notice the type definitions
%token <iValue> INTEGER %type <nPtr> expr
This binds expr
to nPtr
, and INTEGER
to iValue
in the YYSTYPE
union. This is required so that yacc can generate
the correct code. For example, the rule
expr: INTEGER { $$ = con($1); }
should generate the following code. Note that yyvsp[0]
addresses the top of the
value stack, or the value associated with INTEGER
.
yylval.nPtr = con(yyvsp[0].iValue);
The unary minus operator is given higher priority than binary operators as follows:
%left GE LE EQ NE '>' '<' %left '+' '-' %left '*' '/' %nonassoc UMINUS
The %nonassoc
indicates no associativity is implied. It is frequently used in
conjunction with %prec
to specify precedence of a rule. Thus, we have
expr: '-' expr %prec UMINUS { $$ = node(UMINUS, 1, $2); }
indicating that the precedence of the rule is the same as the precedence of token UMINUS
. And UMINUS
(as defined above) has higher precedence than the
other operators. A similar technique is used to remove ambiguity associated with the if-else
statement (see If-Else Ambiguity).
The syntax tree is constructed bottom-up allocating the leaf nodes when variables and integers are reduced. When operators are encountered a node is allocated and pointers to previously allocated nodes are entered as operands.
After the tree is built function ex
is called to do a depth-first walk of the
syntax tree. This results in operators being applied in the order that they were encountered during
parsing. Three versions of ex
, an interpretive version a compiler version, and a version that generates a
syntax tree, are included.