Compilers
Overview
- Compiler structure and phases
- Recursive descent parser
- Using parsers to generate code or trees or ...
- When does recursive descent work
- Other predictive parsers
- Top down and bottom up parsers
- Bottom up parsers
- LL(k) and LR(k) parsers
- Knuth
- Scanners
- Dealing with white space
Parser
- A parser is a program that determines whether a string can be generated
by a particular grammar.
- A given parser is specific for a given grammar.
- Normally the parser has some other actions (eg it generates a parse tree)
- For certain grammars, a recursive descent parser is particularly
easy to write
- Recursive - parser made up of set of mutually exclusive routines
- Descent - Recursive calls correspond to descending a parse tree
Example Grammar
- Example Grammar for (very simple) English sentences:
<sentence> --> <nounPhase>
<verb> <nounPhase>
<nounPhase> --> <article> <noun>
<noun> --> cat | dog
<article> --> a | the
<verb> --> saw | chased
- Example: Build a parse tree for
the cat saw the dog
Recursive Descent Parsers
- A RD parser determines if the input can be generated by the grammar.
- Each Nonterminal has a corresponding procedure
- Calling a procedure reflects substituting the corresponding nonterminal
- Parser routines:
-
currentToken
returns the current token found by scanner
advanceToken
scanner advances to the next token in the input
Recursive Descent Parser for Example Grammar
- This parser checks if a sentence is in the example grammer
procedure test is begin
initialize currentToken;
sentence;
if currentToken = EOF then
put("The input sentence is in the language");
elsif currentToken = ERROR
put("The input sentence is not in the language");
else
put("Something unexpected happened");
end;
procedure sentence is begin -- returns after a seeing a sentence
nounPhrase;
verb;
nounPhrase;
end;
procedure nounPhrase is begin -- returns after a seeing a noun phrase
article;
noun;
end;
procedure article is begin -- returns after seeing an article
if currentToken = "a" then
advanceToken;
elsif currentToken = "the" then
advanceToken;
else
-- Set currentToken to ERROR
end if;
end;
procedure noun is begin -- returns after seeing a noun
...
procedure verb is begin -- returns after seeing a verb
...
Try the cat saw the dog
Try the cat on the dog
Parse Trees and Derivations
- The sequence of procedure calls reflects
- the structure of the parse tree. Each call is a substitution.
- the substitutions of a derivation. (Leftmost or rightmost?)
- Example:
the cat saw the dog
Recursive Descent and the Expression Grammar
- Now lets try Recursive Descent on the Expression Grammar
E -> E + T
E -> T
T -> T * F
T -> F
F -> a
We have two problems:
- Left recursion causes an infinite recursive loop
- No way to determine whether to call production 1 or production 2 until you have
seen the +, at which point it is too late. Why?
- To recognize the
a
with a call to advanceToken
, you must first call T, and then F.
However before calling T, you must know whether to call E again
(production 1) or to call T directly (production 2).
Consider two strings: a
and a + a
- Should you call E, T, F, recognize a
- Or should you call E, E, T, F, recognize a, recognize +, T, F, a
The first choice from E must be to call either E (again) or T.
But you can't tell which to call until you have seen the a
which you won't see until you have made the choice
Expression Grammar Partial Solution
- Partial solution to both problems:
use a right recursive grammar that accepts the same language
E -> T + E
E -> T
T -> F * T
T -> F
F -> a
- In this grammar, E always starts by calling T
- Code looks like this:
procedure expr is begin
term;
if currentToken = '+' then
advanceToken;
expr;
end if;
end;
procedure term is begin
factor;
if currentToken = '*' then
advanceToken;
term;
end if;
end;
procedure factor is begin
if currentToken = 'a' then
advanceToken;
else
error;
end if;
end;
- Try:
a + a + a
- Try:
a * a + a * a * a + a
- A new problem: right recursive grammar is right associative
- Hint for solving: Note that in production 1, after recognizing the text
generated by the T and the + is recognized, the rest of the text is recognized by the E.
However, the first thing the E does is generate the text from another T.
- Hint: In other words, in a parse three, when we are done with E's, the
tree has all T's as leaves.
Revised Expression Grammar Solution
- Solution: use EBNF to generate the T's directly
E -> T { + T }
T -> F { * F }
Code looks like this:
procedure expr is begin
term;
while currentToken = '+' then
advanceToken;
term;
// perform actions for left associative operations
end loop;
end;
procedure term is begin
factor;
while currentToken = '*' then
advanceToken;
factor;
// perform actions for left associative operations
end loop;
end;
The EBNF loses the associativity, but it can be added directly to the parser code.
Try a*a*a + a*a + a
Right Associative Operations - The Problem
Right Associative Operations - Solution
Right Associative Operations - If Example