Compilers (Continued)
Overview
- Compiler Actions
- Using parsers to generate code or trees or ...
- Making recursive descent work
Parser Actions
- A parser can do more than recognize correctness
- Examples:
- Print substitutions in a derivation/parse tree
- Build concrete parse tree
- Calculator (for expression grammar)
- Grammar for first 2 examples:
<sentence> --> <nounPhase>
<verb> <nounPhase>
<nounPhase> --> <article> <noun>
<noun> --> cat | dog
<article> --> a | the
<verb> --> saw | chased
Parser that Prints Substitutions in a Derivation
- This program prints the substitutions in a derivation
procedure sentence is begin -- returns after a seeing a sentence
put("Sentence -> NounPhrase Verb NounPhrase");
nounPhrase;
verb;
nounPhrase;
end;
procedure nounPhrase is begin -- returns after a seeing a noun phrase
put("NounPhrase -> Article Noun");
article;
noun;
end;
procedure article is begin -- returns after seeing an article
if currentToken = "a" then
put("Article -> a");
advanceToken;
elsif currentToken = "the" then
put("Article -> the");
advanceToken;
else
error;
end if;
end;
procedure noun is begin -- returns after seeing a noun
if currentToken = "cat" then
put("Article -> cat");
advanceToken;
elsif currentToken = "dog" then
put("Article -> dog");
advanceToken;
else
error;
end if;
end;
procedure verb is begin -- returns after seeing a verb
...
Try the derivation for the cat saw the dog
What order are procedures called? (Think about a tree and a derivation)
Parser that Generates a Concrete Parse Tree
- Each procedure is replaced with a function that returns the tree found
using that function (ie the tree that is generated by the non-terminal
that the function represents).
- Assume that the Node constructor will create a Node with any number of children
- Example: generate the parse tree for
the cat saw a dog
- In what order are the functions called?
function sentence return Node is begin
left: Node := nounPhrase;
center: Node := verb;
right: Node := nounPhrase;
return new Node("sentence", left, center right)
-- Assume we can make nodes with 3 children
end;
function nounPhrase return Node is begin
left: Node := article;
right: Node := noun;
return new Node("nounPhrase", left, right)
end;
function article return Node is begin
if currentToken = "a" or "the" then
n:Node := new Node(currentToken);
advanceToken;
return n;
else
error;
end if;
end;
function noun return Node is begin
if currentToken = "cat" or "dog" then
n:Node := new Node(currentToken);
advanceToken;
return n;
else
error;
end if;
end;
function verb return Node is begin
...
Building Abstract Trees for Expressions
E -> T { (+|-) T }
T -> F { (*|/) F }
F -> a | b
Recursive Descent Parser:
TreeNode E() is begin -- Generates a parse tree
TreeNode L = T();
while currentToken() is '+' or '-' loop
advanceToken();
TreeNode R = T();
L = new TreeNode(L, +/-, R); # Use + or - as needed
end loop;
return L;
end E;
TreeNode T() is begin
TreeNode L = F();
while currentToken() is '*' or '/' loop
advanceToken();
TreeNode R = F();
L = new TreeNode(L, +/-, R);
end loop;
return L;
end T;
TreeNode F() is begin
if currentToken() is 'a' or 'b'
advanceToken();
return new TreeNode(nil, a/b, nil)
else
print("error"); exit;
end F;
Try a * b + a
Expression Grammar Parser That Outputs Stack Machine
Code
-- Parser that generates stack machine code
procedure E() is begin
T();
while currentToken() is '+' or '-' loop
advanceToken();
T();
generate("ADD"/"SUB");
end loop;
end E;
procedure T() is begin
F();
while currentToken() is '*' or '/' loop
advanceToken();
F();
generate("MUL"/"DIV");
end loop;
end T;
procedure F() is begin
if currentToken() is 'a' or 'b'
advanceToken();
generate("PUSH a/b");
else
print("error");
exit;
end F;
Stack Machine Example
-
Consider this statement: a * b + c
- Equivalent stack machine code:
push a
push b
times
push c
plus
times pops the top two operands, does the multiply, and pushes the result
plus is similar
Assumes the existence of a system stack
Can We Always Write a Recursive Descent Parser
- A recursive descent parser can only be written
if the grammar's productions allow the current token to uniquely
specify the next routine to call.
- Grammars that meet this condition are called LL(k) (We'll see what this means later)
- One way to get this is to have the beginning of
every production be unique.
- General solution: Compute FIRST symbols of grammar rule choices:
FIRST Sets
- Creating FIRST Sets
-
Given A → s1 | s2, we can choose only if FIRST(s1) ∩ FIRST(s2) is
empty
- Example: Expressions involving addition binary numbers (Left
recursive and ambiguous, for simplicity)
E → E+E | B | (E)
B → DB | D
D → 0 | 1
FIRST(E+E) = { (, 0, 1 }
FIRST(B) = { 0, 1 }
FIRST( (E) ) = { ( }
This fails the condition because on seeing a number such as 011, we
don't know whether to call E again or B
The following grammar is not LL(k) for any k:
S -> A | B
A -> aAb | 0
B -> aBbb | 1
- FIRST(A) = {a, 0}
- FIRST(B) = {a, 1}
- Why: Every time we see an a, we don't know whether to expect 1 or 2 b's
so we don't know whether to call A or B
Sometimes we can transform a grammar to make it LL(1)
Revised Example: Expressions involving addition binary numbers
E → T {+T}
T → B | (E)
B → D {D}
D → 0 | 1
- FIRST( (E) ) = { ( }
- FIRST(B) = { 0, 1 }
FOLLOW Sets
- What about a right associative operator: E → T [& E]
- What if an & can follow an E?!?
- Define FOLLOW: The set of strings that can
follow a string s that appears on the RHS
of a production?
- Condition: FIRST(s) ∪ FOLLOW(s) = ∅
- Example:
- FOLLOW(E) = { ), $ }
- FOLLOW(T) = { ), $, + }
- Assume $ represents end of file
- Grammars that meet these conditions can have top down
(including recursive descent) compilers
Table Driven Implementation of RD Parser
- Rows: States
- Columns: Input
- Entry: Action and next state
- Processing loop