Theory, Part II |
Let's extend the previous example to include additional operators.
Input stack +
-
*
/
^
M
f
p
c
,
(
)
$
+
r r s s s s s s s r s r r -
r r s s s s s s s r s r r *
r r r r s s s s s r s r r /
r r r r s s s s s r s r r ^
r r r r s s s s s r s r r M
r r r r r s s s s r s r r f
r r r r r r r r r r s r r p
r r r r r r r r r r s r r c
r r r r r r r r r r s r r ,
r r r r r r r r r r r r e4 (
s s s s s s s s s s s s e1 )
r r r r r r e3 e3 e3 r e2 r r $
s s s s s s s s s e4 s e3 a
The above table incorporates the following operators:
- "
M
", for unary minus. - "
^
", for exponentiation.5 ^ 2
yields25
. - "
f
", for factorial.f(x)
returns the factorial ofx
. - "
p
", for permutations.p(n,r)
returns the number of permutations forn
objects takenr
at a time. - "
c
", for combinations.c(n,r)
returns the number of combinations forn
objects takenr
at a time.
The following operator precedence is implied, starting at the highest precedence:
- unary minus
- exponentiation, right-associative
- multiplication/division, left-associative
- addition/subtraction, left-associative
The following errors are detected:
- error
e1
: missing right parenthesis - error
e2
: missing operator - error
e3
: unbalanced right parenthesis - error
e4
: invalid function argument
Parentheses are often used to override precedence. This is easily accomodated in the operator precedence table using the following algorithm:
input action '(' shift ')' while opr[tos] != '(' reduce shift ')' reduce '()'
On encountering a left parenthesis, we shift it to the opr
stack. When we input
a right parenthesis, the stack is popped until the matching left parenthesis is found. Then
we shift the right parenthesis, and reduce by popping both parentheses off the stack.
For function calls, the function reference is shifted to the stack. Each comma-separated argument is also shifted to the stack. On encountering a comma, the operator stack is reduced until the opening parenthesis of the function call is visible, leaving the function parameter on the value stack. Then the comma is shifted to the stack, and popped on the next reduction. When the closing parenthesis of the function call is encountered, it will be shifted to the stack and reduced. This will expose the function call for subsequent reduction.
Operator classes may be used to minimize the size of the precedence table. For example, a single generic token representing a function call may be pushed on the operator stack. An ordinal, defining which function is referenced, is pushed on the value stack. For this purpose, you may wish to implement each element on the value stack as a union to allow for different datatypes.
Error-checking is not bullet-proof. For example, omitting commas between function arguments will be accepted, and work properly. Even more bizarre, reverse-polish notation, where operators follow operands, is also acceptable. This phenomenon occurs because operators are applied to the top elements of the value stack, and the order that operators and values are pushed is not significant. Major errors, such as omitting an operator, will be found by the final reduction, as the stacks will not be empty. More rigorous error-checking may be done by defining a boolean follow matrix. The current and previous tokens are used as indices into the matrix to determine if one can follow the other.
The table is implemented in the next section, using ANSI-C. Note that there are some differences between my solution and Aho's:
- Aho specifies three precedence relations, whereas I've only specified two (shift and reduce). You can use three if you wish; I'm the last one to stifle creativity!
- Aho uses one stack for both values and operators. I've used separate stacks, as the implementation is typically easier.