ITEC 420  Section 2.3  CFL Pumping Lemma
PDA Pumping Lemma: Bottom Line
 If L is Regular and Infinite, then there exists pumping length
p > 0 such that if string w in L ls length p or longer, then
p has a nonempty substring y within the first p symbols such that
w = xyz and xy^iz is in L.
 Informally we say that an infinite regular language can be
pumped.
 Remember: Some NONregular languages can be pumped.
 If L is Context Free and Infinite, then there exists pumping length
p > 0 such that if string w in L ls length p or longer, then
p has nonempty substring vxy shorter than p symbols (with vy nonempty)
such that w = uvxyz and uv^{i}xy^{i}z is in L.
 Informally we say that an infinite CF language can be
pumped.
PDA Pumping Lemma: Background
 Remember the Pumping Lemma for Regular Languages:

if L regular ⇒ then there is a pumping length p such that for
every w ∈ L if w ≥ p ⇒ then
 w can be divided into 3 parts: w = xyz,
 for each i ≥ 0, xy^{i}z ∈ L
 y > 0
 xy ≤ p
 Why does it work: if p = number of states in M and
w ≥ p, then the computation must have repeated states
within the first p + 1 steps of the computation
CFL Pumping Lemma

if L is a CFL then there is a pumping length p such that
for every s ∈ L, if s ≥ p then
 s can be divided into 5 parts: s = uvxyz,
 for each i ≥ 0, uv^{i}xy^{i}z ∈ L
 vy > 0
 vxy ≤ p
PDA Pumping Lemma  Example 1
 2.36: a^{n}b^{n}c^{n}, n ≥ 0
 We must show that a^{p}b^{p}c^{p}
can't be divided into uvxyz such that
uv^{i}xy^{i}z is in a^{n}b^{n}c^{n}
 There are 3 possibilities for choosing v and y, all of which fail:
 v has a mixture of symbols: pumped up strings will have symbols out
of order. Similarly, y cannot be a mixture of symbols
 v is all one symbol and y is all the same symbol: pumped strings will
have too many or too few of that symbol,
since the number of other symbols will remain the same
 v is all one symbol and y is all a different symbol: pumped strings
may have the same number of the symbols in v and y, but they will have more
or less of those two symbols than of the other symbol
since the number of other symbol will remain the same
 There are no other ways to choose v and y, and so the string can't be
pumped and so the language is not context free
PDA Pumping Lemma  Example 2
 2.37: a^{i}b^{j}c^{k},
0 ≤ i ≤ j ≤ k
 There are 3 possibilities for choosing v and y, all of which fail:
 v has a mixture of symbols: pumped up strings will have symbols out
of order. Similarly, y cannot be a mixture of symbols
 v is all one symbol and y is all the same symbol:
 if the symbol is a
or b, pump up and the number of a's or b's is more than the number of c's
 if the symbol is c
pump down and the number of b's is more than the number of c's
 v is all one symbol and y is all a different symbol: consider the
other symbol (ie one not in v or y):
 If a does not appear in v or y, pump down and the number of b's
and c's decreases while the number of a's remains the same, and so there
will be more a's than b's or c's
 If b does not appear in v or y, pump up and there will be
more a's than b's or c's
 If c does not appear in v or y, pump up and there are more
a's than c's
 There are no other ways to choose v and y, and so the string can't be
pumped and so the language is not context free
PDA Pumping Lemma  Example 3
 2.38: {ww  w in Σ*}
 String: 0^{p}10^{p}1 does not work
 See book
CFG Pumping Lemma  Why Does it Work?
 Apply this to CF languages:
 Must choose w long enough so that its parse tree contains
a repeated internal node

First, for a tree T of height h, consider how long a string T
can derive:
 Assume each node of T has ≤ b leaves (ie the
RHS of each production has ≤ b symbols).
 Height(T) ≤ h ⇒ T derives strings with
length ≤ b^{h} (ie T has ≤ b^{h}
leaves)
 Now consider the height of a tree, call it T, that is
needed to generate a string, w, that is longer than
a certain length:
 if w ≥ b^{h} + 1
 then height(T) ≥ h+1
 Figure 2.35, p. 124
CFG Pumping Lemma  Why it Works (part 2)
 Given the following:
 L is a CFL
 w ∈L
 T is a parse tree for w
 V is the number of variables in a grammar
for L
 b is the length of the longest RHS in the grammar
 If w ≥ b^{V}+1,
 then height(T) ≥ V + 1.
 If height(T) ≥ V + 1, then
T has a repetition,

If T has a repetition then w can be divided
into pieces
w = uvxyz for which the pumping
conditions hold.
 Pumping Conditions:
 uv^{i}xy^{i}z ∈ L,
for i ≥ 0
 vy > 0
 vxy ≤ p
Some Final Points
 Nondeterministic PDA are MORE powerful than deterministic ones!
 Why: may not know whether to push or pop until have
seen more input symbols
 Example: ww^{R}
 Constrast with: wcw^{R}
 How to implement NPDA: backtracking = stack choices not
taken when they occur and go to stack when a choice fails
 Several algorithms exist that parse any CFG in O(n^{3}) time
 CYK (Cocke, Younger, Kasami) algorithm:
 bottom up
 requires grammar in CNF
 Earley's algorithm
 top down
 Requires elimination of epsilon productions and unit rules
 Parses unambiguous grammars in O(n^{2}) time
 Parses LR(1) grammars in O(n) time
 Slightly faster algorithms O(n ^{ lg 7}) can parse all CFL
 LR(k) grammars are the largest class that can be recognized without
backtracking (ie recognized by deterministic PDA)