ITEC 420 - Section 2.3 - CFL Pumping Lemma
PDA Pumping Lemma: Bottom Line
- If L is Regular and Infinite, then there exists pumping length
p > 0 such that if string w in L ls length p or longer, then
p has a nonempty substring y within the first p symbols such that
w = xyz and xy^iz is in L.
- Informally we say that an infinite regular language can be
pumped.
- Remember: Some NON-regular languages can be pumped.
- If L is Context Free and Infinite, then there exists pumping length
p > 0 such that if string w in L ls length p or longer, then
p has nonempty substring vxy shorter than p symbols (with vy nonempty)
such that w = uvxyz and uvixyiz is in L.
- Informally we say that an infinite CF language can be
pumped.
PDA Pumping Lemma: Background
- Remember the Pumping Lemma for Regular Languages:
-
if L regular ⇒ then there is a pumping length p such that for
every w ∈ L if |w| ≥ p ⇒ then
- w can be divided into 3 parts: w = xyz,
- for each i ≥ 0, xyiz ∈ L
- |y| > 0
- |xy| ≤ p
Why does it work: if p = number of states in M and
|w| ≥ p, then the computation must have repeated states
within the first p + 1 steps of the computation
CFL Pumping Lemma
-
if L is a CFL then there is a pumping length p such that
for every s ∈ L, if |s| ≥ p then
- s can be divided into 5 parts: s = uvxyz,
- for each i ≥ 0, uvixyiz ∈ L
- |vy| > 0
- |vxy| ≤ p
PDA Pumping Lemma - Example 1
- 2.36: anbncn, n ≥ 0
- We must show that apbpcp
can't be divided into uvxyz such that
uvixyiz is in anbncn
- There are 3 possibilities for choosing v and y, all of which fail:
- v has a mixture of symbols: pumped up strings will have symbols out
of order. Similarly, y cannot be a mixture of symbols
- v is all one symbol and y is all the same symbol: pumped strings will
have too many or too few of that symbol,
since the number of other symbols will remain the same
- v is all one symbol and y is all a different symbol: pumped strings
may have the same number of the symbols in v and y, but they will have more
or less of those two symbols than of the other symbol
since the number of other symbol will remain the same
- There are no other ways to choose v and y, and so the string can't be
pumped and so the language is not context free
PDA Pumping Lemma - Example 2
- 2.37: aibjck,
0 ≤ i ≤ j ≤ k
- There are 3 possibilities for choosing v and y, all of which fail:
- v has a mixture of symbols: pumped up strings will have symbols out
of order. Similarly, y cannot be a mixture of symbols
- v is all one symbol and y is all the same symbol:
- if the symbol is a
or b, pump up and the number of a's or b's is more than the number of c's
- if the symbol is c
pump down and the number of b's is more than the number of c's
- v is all one symbol and y is all a different symbol: consider the
other symbol (ie one not in v or y):
- If a does not appear in v or y, pump down and the number of b's
and c's decreases while the number of a's remains the same, and so there
will be more a's than b's or c's
- If b does not appear in v or y, pump up and there will be
more a's than b's or c's
- If c does not appear in v or y, pump up and there are more
a's than c's
- There are no other ways to choose v and y, and so the string can't be
pumped and so the language is not context free
PDA Pumping Lemma - Example 3
- 2.38: {ww | w in Σ*}
- String: 0p10p1 does not work
- See book
CFG Pumping Lemma - Why Does it Work?
- Apply this to CF languages:
- Must choose w long enough so that its parse tree contains
a repeated internal node
-
First, for a tree T of height h, consider how long a string T
can derive:
- Assume each node of T has ≤ b leaves (ie the
RHS of each production has ≤ b symbols).
- Height(T) ≤ h ⇒ T derives strings with
length ≤ bh (ie T has ≤ bh
leaves)
- Now consider the height of a tree, call it T, that is
needed to generate a string, w, that is longer than
a certain length:
- if |w| ≥ bh + 1
- then height(T) ≥ h+1
- Figure 2.35, p. 124
CFG Pumping Lemma - Why it Works (part 2)
- Given the following:
- L is a CFL
- w ∈L
- T is a parse tree for w
- |V| is the number of variables in a grammar
for L
- b is the length of the longest RHS in the grammar
- If |w| ≥ b|V|+1,
- then height(T) ≥ |V| + 1.
- If height(T) ≥ |V| + 1, then
T has a repetition,
-
If T has a repetition then w can be divided
into pieces
w = uvxyz for which the pumping
conditions hold.
- Pumping Conditions:
- uvixyiz ∈ L,
for i ≥ 0
- |vy| > 0
- |vxy| ≤ p
Some Final Points
- Nondeterministic PDA are MORE powerful than deterministic ones!
- Why: may not know whether to push or pop until have
seen more input symbols
- Example: wwR
- Constrast with: wcwR
- How to implement NPDA: backtracking = stack choices not
taken when they occur and go to stack when a choice fails
- Several algorithms exist that parse any CFG in O(n3) time
- CYK (Cocke, Younger, Kasami) algorithm:
- bottom up
- requires grammar in CNF
- Earley's algorithm
- top down
- Requires elimination of epsilon productions and unit rules
- Parses unambiguous grammars in O(n2) time
- Parses LR(1) grammars in O(n) time
- Slightly faster algorithms O(n lg 7) can parse all CFL
- LR(k) grammars are the largest class that can be recognized without
backtracking (ie recognized by deterministic PDA)