Theory, Part II

Let's extend the previous example to include additional operators.

Input
stack  +   -   *   /   ^   M   f   p   c   ,   (   )   $ 
+ r r s s s s s s s r s r r
- r r s s s s s s s r s r r
* r r r r s s s s s r s r r
/ r r r r s s s s s r s r r
^ r r r r s s s s s r s r r
M r r r r r s s s s r s r r
f r r r r r r r r r r s r r
p r r r r r r r r r r s r r
c r r r r r r r r r r s r r
, r r r r r r r r r r r r e4
( s s s s s s s s s s s s e1
) r r r r r r e3 e3 e3 r e2 r r
$ s s s s s s s s s e4 s e3 a

The above table incorporates the following operators:

The following operator precedence is implied, starting at the highest precedence:

The following errors are detected:

Parentheses are often used to override precedence. This is easily accomodated in the operator precedence table using the following algorithm:

input     action
 '('      shift
 ')'      while opr[tos] != '('
             reduce
          shift ')'
          reduce '()'

On encountering a left parenthesis, we shift it to the opr stack. When we input a right parenthesis, the stack is popped until the matching left parenthesis is found. Then we shift the right parenthesis, and reduce by popping both parentheses off the stack.

For function calls, the function reference is shifted to the stack. Each comma-separated argument is also shifted to the stack. On encountering a comma, the operator stack is reduced until the opening parenthesis of the function call is visible, leaving the function parameter on the value stack. Then the comma is shifted to the stack, and popped on the next reduction. When the closing parenthesis of the function call is encountered, it will be shifted to the stack and reduced. This will expose the function call for subsequent reduction.

Operator classes may be used to minimize the size of the precedence table. For example, a single generic token representing a function call may be pushed on the operator stack. An ordinal, defining which function is referenced, is pushed on the value stack. For this purpose, you may wish to implement each element on the value stack as a union to allow for different datatypes.

Error-checking is not bullet-proof. For example, omitting commas between function arguments will be accepted, and work properly. Even more bizarre, reverse-polish notation, where operators follow operands, is also acceptable. This phenomenon occurs because operators are applied to the top elements of the value stack, and the order that operators and values are pushed is not significant. Major errors, such as omitting an operator, will be found by the final reduction, as the stacks will not be empty. More rigorous error-checking may be done by defining a boolean follow matrix. The current and previous tokens are used as indices into the matrix to determine if one can follow the other.

The table is implemented in the next section, using ANSI-C. Note that there are some differences between my solution and Aho's: