ITEC 460 - Chapter 4 Notes

Top Down Parsing - Overview

Create parse tree from root down
Recursive Descent Parsing
First and Follow Sets
LL(1) Parse Tables
Non-LL(1) grammars

Ambiguity
Left recursion
Left factoring

LL(K) Parsers
JavaCC

Top Down vs Bottom Up Parsing

Parse tree has the start symbol as the root and terminals for the leaves

Top Down Parsing

Create parse tree from the root to the leaves

Bottom Up Parsing

Create a parse tree from the leaves to the root
Create a forest of trees and merge the trees to form larger trees
Common algorithm: Table driven LR parser (Chapter 5)
Uncommon algorithm: Recursive Ascent

Example: parse 2 + 3 * 5 with E → E + T|T; T → T * num | num

Predictive and Backtracking Algorithms

Backtracking: Grammars with multiple possibilities at each step require backtracking algorithms

Performance:

Predictive: Grammars with single choice at each step allow predictive algorithms (ie no backtracking)

Performance:

LL and LR Grammars and Parsing Algorithms

LL Parser - Leftmost scan of input, follow steps of leftmost parse

Top down
Predictive
LL Grammar: Grammar for which LL parser can be created

LR Parser - Leftmost scan of input, follow steps of rightmost parse (in reverse)

Bottom Up
No backtracking
LR Grammar: Grammar for which LL parser can be created

LL(k) and LR(k) Parsing

LL(k) - use k symbols of lookahead to make next decision
LR(k) - use k symbols of lookahead to make next decision

LL(1) and LR(1) are most common

Example: An LL(2) grammar (could be transformed into LL(1)):

            A → abcD;
            A → aefG;

Common Top Down Algorithms

Recursive descent:

Table driven

Recursive Descent Parsing

Routine for each variable

Routines are mutually recursive
Routine body expresses RHS of rules

terminals in RHS are matched
variables in RHS are called

For predictive (ie LL) parsing, first token must uniquely determine the next routine to call

May need to transform the grammar to make grammar LL

Remove ambiguity
Remove left recursion
Left factor

RD frequently uses grammar in EBNF form

Example: E → T { + T } vs E → T + E | T
Example: S → ifHead [ ifRest ]
How do we express these examples in in LL BNF?

Table Driven Top Down Parsing

Table construction:

Row for each variable
Column for each terminal
Look at current input symbol and current variable
Table entry shows (RHS of) production to expand

Parsing algorithm uses a stack of vars and terminals

Algorithm - Repeat until:

Push start symbol onto stack
Repeat until stack is empty

If TOS is a terminal, then pop and match with input
If TOS is a variable, then:

pop variable V
RHS = Table(V, current input symbol)
Push RHS, in reverse

Example: Handout of expression grammar

Top Down Parsing - Which Rule to Use

Each variable in the parse tree derives a part of the input string

Must choose which rule to use to expand by looking at current variable and current terminal

Current input will be the first terminal in the string derived from the current variable

Example:

Creating LL Parsers

Create tables using sets First(γ) and Follow(A) for each variable A

First(γ) for each rule RHS γ
Follow(A) for each variable A

Tables can be used to create RD parser as well as creating RD parsers

First and Follow Sets: Definition

The LL Parsing Table is created using the sets First and Follow

For any variable A and any string of variables and terminals γ

First(A): {terminals that start strings that can be derived from variable A}

First(γ): {terminals that start strings that can be derived from string γ}

Follow(A): {terminals that follow variable A in any string}

Example: ...

Using First and Follow to create the LL Parse Table

Algorithm:

    Table: array(Vars, Terminals) of Strings;

    For each rule A → γ loop
        for each term t in First(γ) loop
            Table(A, t) :=  γ
        end loop
        if ε in First(γ) then
            for each term t in Follow(A) loop
                Table(A, t) :=  γ
            end loop
        end if
    end loop

    For each rule A → γ loop
        for each term t in First(γ) loop
            if t / = ε then
                Table(A, t) :=  γ
            else -- t = ε
                for each term t in Follow(A) loop
                    Table(A, t) :=  γ
                end loop
            end if
        end loop
    end loop

Example
Before making table, we must calculate First(gamma) for each RHS gamma and Follow(A) for each var A

Definition of First(γ)

Definition:


     gamma     | First(gamma)
    -----------|-------------
    - empty    | empty
    - a gamma1 | a
    - A gamma1 | First(A) if eps not in First(A)
               | First(A) - empty + First(gamma1) if eps in First(A)

Example
Thus First(gamma) depends on First(A), calculated as follows:

Algorithm for calculating of First(A)

Algorithm:

    Algorithm for finding First(A) for all var A:
        Initialize all firsts of vars,
        loop
            for all rules A → gamma loop:
                First(A) = First(A) + First(gamma)   -- Point 1
            end  loop
            exit when no change
        end loop

Comments on Algorithm for calculating of First(A)

This is strange: the def and the alg depend on each other.
How does this work?

At Point 1, First(gamma) calculated as in Definition, using current version of First(A)
As First(A) changes, so will First(gamma)
First(A) alg will eventually stop

Must stop cause all sets are finite
Can be expressed as set of eqns - answer called the fixed point of the equations

When Algorithm exits, use final values of First(A) to calculate final First(gamma)

Follow Set: Algorithm

Algorithm:

Calculating Follow(A) for all vars A:
    1. Calculate First(A) for all A
    2. Initialize Follow(A) = {} for all A
    3. loop:
            For each rule A -> gamma loop:
                For each A1 in gamma = alpha A1 beta loop:
                    Follow(A1) = Follow(A1)      + First(beta), if eps not in First(beta)
                    Follow(A1) = Follow(A1) -eps + First(beta) + Follow(A), if eps in First(beta)
                end loop
           end loop
           exit when above loop does not change any Follow(A)
       end loop

Grammar Transformation: Remove Ambiguity

Grammar Transformation: Remove Left Recursion

Grammar Transformation: Left Factoring

Creating JavaCC Parsers

JavaCC creates a LL(k) parser (typically LL(1))

Add JavaCC rules and actions

Example - while and assignment statements:

                void statement() :
                { 
                    // Add needed declarations here
                }

                {
                     <WHILE>  <LPAREN>  expression()  <RPAREN>  statement()
                  |  variable()  <ASSIGN>  expression()  <SEMICOLON>
                }

JavaCC Example - Prefix Expressions

Parses previx expressions
Rule for expressions:

            void expression() :
            {}
            {
                <PLUS> expression() expression()
              | <MINUS> expression() expression()
              | <TIMES> expression() expression()
              | <DIVIDE> expression() expression()
              | <INTEGER_LITERAL>
            }

Complete .jj file in Figure 4.12 (with common RE factored out)

Call with start symbol: parser.expression() (Fig. 4.15)

Anonymous Tokens

Tokens can be specified directly in parse rules
Example: infix expressions, Fig. 4.15

LOOKAHEAD Directives

Default LOOKAHEAD is 1
Global LOOKAHEAD can be specified in the .jj Options Section

            options
            {
                LOOKAHEAD = 2
            }

Warning: This is inefficient

LOOKAHEAD can also be specified locally in choice points in rules

            void s()
            {}
            {
                LOOKAHEAD(2)
                "a" "b" "c"
                | "a" "d" "c"
            }

Another example:

            void s()
            {}
            {
                "a" 
                (
                    LOOKAHEAD(2)
                    ( "b" "c" | "b" "d" )
                )
            }

Ambiguity and JavaCC

No ambiguous grammar is LL(k), for any k

When given ambiguous rules, JavaCC will produce a parser that resolves the ambiguity by choosing the first rule (removing the ambiguity)

Solution: revise grammar

Example: Ambiguous if

            void statement() :
            {}
            {
                <IF> expression() <THEN> statement()
              | <IF> expression() <THEN> statement() <ELSE> statement()
              | // Other statements
            }

Why is this ambiguous
What happens with the JavaCC parser?

Solution - revised grammar:

            void statement() :
            {}
            {
                <IF> expression() <THEN> statement() optionalElse()
              | // Other statements
            }
            void optionalElse() :
            {}
            {
                <ELSE> statement()
              | { }                  // If no ELSE, do nothing 
            }

What does this do with some statements?

Project

Project 4.5

Last modified on