ITEC 420 - Section 2.3 - CFL Pumping Lemma

PDA Pumping Lemma: Bottom Line

• If L is Regular and Infinite, then there exists pumping length p > 0 such that if string w in L ls length p or longer, then p has a nonempty substring y within the first p symbols such that w = xyz and xy^iz is in L.
• Informally we say that an infinite regular language can be pumped.
• Remember: Some NON-regular languages can be pumped.

• If L is Context Free and Infinite, then there exists pumping length p > 0 such that if string w in L ls length p or longer, then p has nonempty substring vxy shorter than p symbols (with vy nonempty) such that w = uvxyz and uvixyiz is in L.
• Informally we say that an infinite CF language can be pumped.

PDA Pumping Lemma: Background

• Remember the Pumping Lemma for Regular Languages:
• if L regular ⇒ then there is a pumping length p such that for every wL if |w| ≥ p ⇒ then
• w can be divided into 3 parts: w = xyz,
1. for each i ≥ 0, xyizL

2. |y| > 0

3. |xy| ≤ p

• Why does it work: if p = number of states in M and |w| ≥ p, then the computation must have repeated states within the first p + 1 steps of the computation

CFL Pumping Lemma

• if L is a CFL then there is a pumping length p such that for every sL, if |s| ≥ p then

• s can be divided into 5 parts: s = uvxyz,

1. for each i ≥ 0, uvixyizL

2. |vy| > 0

3. |vxy| ≤ p

PDA Pumping Lemma - Example 1

• 2.36: anbncn, n ≥ 0
• We must show that apbpcp can't be divided into uvxyz such that uvixyiz is in anbncn

• There are 3 possibilities for choosing v and y, all of which fail:
1. v has a mixture of symbols: pumped up strings will have symbols out of order. Similarly, y cannot be a mixture of symbols

2. v is all one symbol and y is all the same symbol: pumped strings will have too many or too few of that symbol, since the number of other symbols will remain the same

3. v is all one symbol and y is all a different symbol: pumped strings may have the same number of the symbols in v and y, but they will have more or less of those two symbols than of the other symbol since the number of other symbol will remain the same

• There are no other ways to choose v and y, and so the string can't be pumped and so the language is not context free

PDA Pumping Lemma - Example 2

• 2.37: aibjck, 0 ≤ i ≤ j ≤ k

• There are 3 possibilities for choosing v and y, all of which fail:
1. v has a mixture of symbols: pumped up strings will have symbols out of order. Similarly, y cannot be a mixture of symbols

2. v is all one symbol and y is all the same symbol:
• if the symbol is a or b, pump up and the number of a's or b's is more than the number of c's
• if the symbol is c pump down and the number of b's is more than the number of c's

3. v is all one symbol and y is all a different symbol: consider the other symbol (ie one not in v or y):
1. If a does not appear in v or y, pump down and the number of b's and c's decreases while the number of a's remains the same, and so there will be more a's than b's or c's

2. If b does not appear in v or y, pump up and there will be more a's than b's or c's

3. If c does not appear in v or y, pump up and there are more a's than c's

• There are no other ways to choose v and y, and so the string can't be pumped and so the language is not context free

PDA Pumping Lemma - Example 3

• 2.38: {ww | w in Σ*}

• String: 0p10p1 does not work

• See book

CFG Pumping Lemma - Why Does it Work?

• Apply this to CF languages:
• Must choose w long enough so that its parse tree contains a repeated internal node

• First, for a tree T of height h, consider how long a string T can derive:
• Assume each node of T has ≤ b leaves (ie the RHS of each production has ≤ b symbols).

• Height(T) ≤ h ⇒ T derives strings with length ≤ bh (ie T has ≤ bh leaves)

• Now consider the height of a tree, call it T, that is needed to generate a string, w, that is longer than a certain length:
• if |w| ≥ bh + 1
• then height(T) ≥ h+1
• Figure 2.35, p. 124

CFG Pumping Lemma - Why it Works (part 2)

• Given the following:
• L is a CFL
• wL
• T is a parse tree for w
• |V| is the number of variables in a grammar for L
• b is the length of the longest RHS in the grammar

• If |w| ≥ b|V|+1,

• then height(T) ≥ |V| + 1.

• If height(T) ≥ |V| + 1, then T has a repetition,

• If T has a repetition then w can be divided into pieces w = uvxyz for which the pumping conditions hold.

• Pumping Conditions:
• uvixyizL, for i ≥ 0
• |vy| > 0
• |vxy| ≤ p

Some Final Points

• Nondeterministic PDA are MORE powerful than deterministic ones!
• Why: may not know whether to push or pop until have seen more input symbols
• Example: wwR
• Constrast with: wcwR

• How to implement NPDA: backtracking = stack choices not taken when they occur and go to stack when a choice fails

• Several algorithms exist that parse any CFG in O(n3) time
• CYK (Cocke, Younger, Kasami) algorithm:
• bottom up
• requires grammar in CNF
• Earley's algorithm
• top down
• Requires elimination of epsilon productions and unit rules
• Parses unambiguous grammars in O(n2) time
• Parses LR(1) grammars in O(n) time

• Slightly faster algorithms O(n lg 7) can parse all CFL

• LR(k) grammars are the largest class that can be recognized without backtracking (ie recognized by deterministic PDA)