ITEC 460 - Chapter 4 Notes
Top Down Parsing - Overview
- Create parse tree from root down
- Recursive Descent Parsing
- First and Follow Sets
- LL(1) Parse Tables
- Non-LL(1) grammars
- Ambiguity
- Left recursion
- Left factoring
- LL(K) Parsers
- JavaCC
Top Down vs Bottom Up Parsing
- Parse tree has the start symbol as the root and terminals for the leaves
- Top Down Parsing
- Create parse tree from the root to the leaves
- Bottom Up Parsing
- Create a parse tree from the leaves to the root
- Create a forest of trees and merge the trees to form larger trees
- Common algorithm: Table driven LR parser (Chapter 5)
- Uncommon algorithm: Recursive Ascent
- Example: parse 2 + 3 * 5 with E → E + T|T; T → T * num | num
Predictive and Backtracking Algorithms
- Backtracking: Grammars with multiple possibilities at each step require backtracking algorithms
- Predictive: Grammars with single choice at each step allow predictive algorithms (ie no backtracking)
LL and LR Grammars and Parsing Algorithms
- LL Parser - Leftmost scan of input, follow steps of leftmost parse
- Top down
- Predictive
- LL Grammar: Grammar for which LL parser can be created
- LR Parser - Leftmost scan of input, follow steps of rightmost parse (in reverse)
- Bottom Up
- No backtracking
- LR Grammar: Grammar for which LL parser can be created
LL(k) and LR(k) Parsing
- LL(k) - use k symbols of lookahead to make next decision
- LR(k) - use k symbols of lookahead to make next decision
- LL(1) and LR(1) are most common
- Example: An LL(2) grammar (could be transformed into LL(1)):
A → abcD;
A → aefG;
Common Top Down Algorithms
- Recursive descent:
- Table driven
Recursive Descent Parsing
- Routine for each variable
- Routines are mutually recursive
- Routine body expresses RHS of rules
- terminals in RHS are matched
- variables in RHS are called
- For predictive (ie LL) parsing, first token must uniquely determine the next routine to call
- May need to transform the grammar to make grammar LL
- Remove ambiguity
- Remove left recursion
- Left factor
- RD frequently uses grammar in EBNF form
- Example: E → T { + T } vs E → T + E | T
- Example: S → ifHead [ ifRest ]
- How do we express these examples in in LL BNF?
Table Driven Top Down Parsing
- Table construction:
- Row for each variable
- Column for each terminal
- Look at current input symbol and current variable
- Table entry shows (RHS of) production to expand
- Parsing algorithm uses a stack of vars and terminals
- Algorithm - Repeat until:
- Push start symbol onto stack
- Repeat until stack is empty
- If TOS is a terminal, then pop and match with input
- If TOS is a variable, then:
- pop variable V
- RHS = Table(V, current input symbol)
- Push RHS, in reverse
- Example: Handout of expression grammar
Top Down Parsing - Which Rule to Use
- Each variable in the parse tree derives a part of the input string
- Must choose which rule to use to expand by looking at current variable and current terminal
- Current input will be the first terminal in the string derived from the current variable
- Example:
Creating LL Parsers
- Create tables using sets First(γ) and Follow(A) for each variable A
- First(γ) for each rule RHS γ
- Follow(A) for each variable A
- Tables can be used to create RD parser as well as creating RD parsers
First and Follow Sets: Definition
- The LL Parsing Table is created using the sets First and Follow
- For any variable A and any string of variables and terminals γ
- First(A): {terminals that start strings that can be derived from variable A}
- First(γ): {terminals that start strings that can be derived from string γ}
- Follow(A): {terminals that follow variable A in any string}
- Example: ...
Using First and Follow to create the LL Parse Table
Table: array(Vars, Terminals) of Strings;
For each rule A → γ loop
for each term t in First(γ) loop
Table(A, t) := γ
end loop
if ε in First(γ) then
for each term t in Follow(A) loop
Table(A, t) := γ
end loop
end if
end loop
For each rule A → γ loop
for each term t in First(γ) loop
if t / = ε then
Table(A, t) := γ
else -- t = ε
for each term t in Follow(A) loop
Table(A, t) := γ
end loop
end if
end loop
end loop
Example
Before making table, we
must calculate First(gamma) for each RHS gamma and Follow(A) for each var A
Definition of First(γ)
gamma | First(gamma)
-----------|-------------
- empty | empty
- a gamma1 | a
- A gamma1 | First(A) if eps not in First(A)
| First(A) - empty + First(gamma1) if eps in First(A)
Example
Thus First(gamma) depends on First(A), calculated as follows:
Algorithm for calculating of First(A)
Algorithm for finding First(A) for all var A:
Initialize all firsts of vars,
loop
for all rules A → gamma loop:
First(A) = First(A) + First(gamma) -- Point 1
end loop
exit when no change
end loop
Comments on Algorithm for calculating of First(A)
- This is strange: the def and the alg depend on each other.
- How does this work?
- At Point 1, First(gamma) calculated as in Definition, using current version of First(A)
- As First(A) changes, so will First(gamma)
- First(A) alg will eventually stop
- Must stop cause all sets are finite
- Can be expressed as set of eqns
- answer called the fixed point of the equations
- When Algorithm exits, use final values of First(A) to calculate final First(gamma)
Follow Set: Algorithm
Calculating Follow(A) for all vars A:
1. Calculate First(A) for all A
2. Initialize Follow(A) = {} for all A
3. loop:
For each rule A -> gamma loop:
For each A1 in gamma = alpha A1 beta loop:
Follow(A1) = Follow(A1) + First(beta), if eps not in First(beta)
Follow(A1) = Follow(A1) -eps + First(beta) + Follow(A), if eps in First(beta)
end loop
end loop
exit when above loop does not change any Follow(A)
end loop
Grammar Transformation: Remove Ambiguity
Grammar Transformation: Remove Left Recursion
Grammar Transformation: Left Factoring
Creating JavaCC Parsers
- JavaCC creates a LL(k) parser (typically LL(1))
- Add JavaCC rules and actions
- Example - while and assignment statements:
void statement() :
{
// Add needed declarations here
}
{
<WHILE> <LPAREN> expression() <RPAREN> statement()
| variable() <ASSIGN> expression() <SEMICOLON>
}
JavaCC Example - Prefix Expressions
- Parses previx expressions
- Rule for expressions:
void expression() :
{}
{
<PLUS> expression() expression()
| <MINUS> expression() expression()
| <TIMES> expression() expression()
| <DIVIDE> expression() expression()
| <INTEGER_LITERAL>
}
Complete .jj file in Figure 4.12 (with common RE factored out)
Call with start symbol: parser.expression()
(Fig. 4.15)
Anonymous Tokens
- Tokens can be specified directly in parse rules
- Example: infix expressions, Fig. 4.15
LOOKAHEAD Directives
- Default LOOKAHEAD is 1
- Global LOOKAHEAD can be specified in the .jj Options Section
options
{
LOOKAHEAD = 2
}
Warning: This is inefficient
LOOKAHEAD can also be specified locally in choice points in rules
void s()
{}
{
LOOKAHEAD(2)
"a" "b" "c"
| "a" "d" "c"
}
Another example:
void s()
{}
{
"a"
(
LOOKAHEAD(2)
( "b" "c" | "b" "d" )
)
}
Ambiguity and JavaCC
- No ambiguous grammar is LL(k), for any k
- When given ambiguous rules, JavaCC will produce a parser that resolves the ambiguity by choosing
the first rule (removing the ambiguity)
- Solution: revise grammar
- Example: Ambiguous if
void statement() :
{}
{
<IF> expression() <THEN> statement()
| <IF> expression() <THEN> statement() <ELSE> statement()
| // Other statements
}
- Why is this ambiguous
- What happens with the JavaCC parser?
Solution - revised grammar:
void statement() :
{}
{
<IF> expression() <THEN> statement() optionalElse()
| // Other statements
}
void optionalElse() :
{}
{
<ELSE> statement()
| { } // If no ELSE, do nothing
}
What does this do with some statements?
Project
Last modified on