ITEC 420 - Section 2.2 - Pushdown Automata

Pushdown Automata (PDA) - The Big Picture

PDA = NFA + Stack!

PDA give another way to define Context Free Languages

L is a CFL iff L = L(CFG) and L = L(PDA) !

Sometimes a PDA is easier to analyze than a CFG

PDA: Operations and Computation

Figures 2.11 and 2.12: State control has pointer to input AND pointer to stack top

Each PDA transition does the following:

Consume a symbol from the input
Pop a symbol off of the stack
Push a symbol onto the stack

Any of the symbols can be ε

Consume ε means don't consume input
Pop ε means don't pop
Push ε means don't push
Usually a transition does either a push or a pop, although it can do both

Computation:

Start with empty stack
Accept if a path leaves PDA in accept state at end of input
Contents of stack at end doesn't matter

PDA: First Examples

0ⁿ1ⁿ, n ≥ 0

Some attempted solutions: 2 state, 3 state, 4 state

0^m1ⁿ, m < n

0^m1ⁿ, m ≤ n

0^m1ⁿ, n = 2m

PDA: First Example Revisited

Figure 2.15

state 1 to 2: ε, push $
state 2 to 2: 0, push 0
state 2 to 3: 1, pop 0
state 3 to 3: 1, pop 0
state 3 to 4: ε, pop $

Transitions are a pair:

(Input Symbol, old top symbol of stack→ new top symbol of stack)
Any of these symbols can be ε

What does a push look like? And a pop

Some transitions change stack without processing any input

Transitions can also process input without changing stack

See formal definition of machine in Example 2.14:

How to push 2 symbols?

Testing for Empty Stack and End of Input

We can't test for empty stack

However, pushing a special symbol and detecting it is equivalent to testing for the empty stack

We can't test for end of input;

However, when we move to accept state we can only accept if at end of input

PDA: Stack and Machine Operations

(q2, c) ∈ δ(q1, a, b)

Input symbol a
Pop b
Push c
Transition from q1 to q2: a, b → c

Stack Operations:

push
pop
replace

PDA: Formal Definition

A Pushdown Automata is a 6-tuple (Q, Σ, Γ, δ, q₀, F), where Q, Σ, Γ, and F are finite sets, and

Q is the set of states
Σ is the input alphabet
Γ is the stack alphabet
δ: Q × Σ_ε × Γ_ε → P(Q × Γ_ε) is the transition function
q₀ is the start state
F ⊆ Q is the set of accept states

Nondeterministic: ε transitions and Range is set of (state, stack symbol) pairs

Computation: A PDA accepts string w iff

start in start state with empty stack
w takes PDA through sequence of states with corresponding sequence of stack content strings
PDA is in a final state at end of input

Stack does not have to be empty at end (but it frequently is)

PDA: More Examples

Example 2.16: aⁱb^jc^k, i=j or i=k

Example 2.18: ww^R

Example: {w | w ∈ (0∪1)* and w has more 0's than 1's}

Theorem 2.20 = CFL = PDA

Theorem 2.20: A language is context free if and only if some pushdown automata recognizes it.

Lemma 2.21: If a language is context free, then some PDA recognizes it

Lemma 2.27: If a PDA recognizes some language, then it is context free.

Lemma 2.21: CFG → PDA - Example

Example 2.25: Construct a PDA for this grammar:

S → aTb | b
T → Ta | ε

Machine: Figure 2.26

Push $
Push Start Symbol
Loop state:

Pop LHS, push RHS, in reverse

Which symbol ends up on top of stack

Scan and pop an input symbol

Pop $
Nondeterministic!

Find parse trees, and derivation, and trace algorithm for

Recognize: b
Recognize: ab
Recognize: aaab

Lemma 2.21: CFG → PDA - Basic Idea

Basic Idea: Simulate steps of leftmost derivation

Input-so-far + current Stack = current step in derivation

To substitute: pop variable from stack and replace with RHS of rule

Advance input (and pop stack) when stack top = input

Non deterministic algorithm: must pick correct substitution

[Top down] Compilers use a deterministic version of this algorithm

Lemma 2.21: CFG → PDA - Application to Compilers

A compiler with a top-down parser deterministically does the required substitutions

Makes choice based on current input and current stack

Grammar must be constructed so that compiler can always figure out what substition to do at each step

Lemma 2.21: CFG → PDA - Informal Algorithm

Place marker $ and the start variable on the stack

Repeat forever:

If the top of stack is variable symbol A, then:

nondeterministically select one of the rules whose LHS is A, and
pop A and push the string that is on the RHS of that rule

If the top of the stack is terminal a, read the next input symbol from the input and compare it to a.

If they match, repeat the loop.
If they don't match, reject this branch of the nondeterminism.

If the top of the stack is $, enter the accept state. Doing so accepts the input, if it has all been read.

The substitutions in the branch of the nondeterminism that succeeds will follow the substitutions of a leftmost derivation.

The stack contents will be the strings of a derivation, minus the initial symbols that are matched.

Lemma 2.21: CFG → PDA - Shorthand Notation

Use notation (r, u) ∈ δ(q,a,s) to transition from q to r, pop s and push the entire string u at once.

Requires set E of extra states

Figure 2.23, page 117

Lemma 2.21: CFG → PDA - Formal Definition

Must create NFA P = (Q, Σ, Γ, δ, q₁, F)

Q = { q_start, q_loop, q_accept} ∪ E

Algorithm steps - Create productions as follows:

δ(q_start, ε, ε) = {(q_loop, S$)}
For all productions and terminals:

δ(q_loop, ε, A) = {(q_loop, w)} where A → w is a rule in R

δ(q_loop, a, a) = {(q_loop, ε)} for terminal a

δ(q_loop, ε, $) = {(q_accept, ε)}

CFG → PDA: Another Example

E → E + T | T

T → a | b

Consider w = a + b + a

CFG → PDA: Showing it Works

Assume machine M is constructed from grammar G

We must show that w ∈ L(M) implies w ∈ in L(G)

Steps of a Leftmost Derivation of string w are the same as [input matched so far] + [current stack] at each step of the computation of p on w

Now the other way: PDA → CFG

Now let's show that every PDA has an equivalent CFG

When showing CFG → PDA, we simulated a grammar derivation on the PDA

Now we must simulate a PDA computation with a CFG!

CFG G must generate all strings that PDA P accepts.

If string x takes P from its start state to its accept state, then G must generate x

PDA → CFG: Main Idea

We design grammar G so that

For every pair of states p and q in P, grammar G has a variable A_pq

A_pq generates all strings that take P from p with an empty stack to q with an empty stack.

It is easier to design G if P is in a standard form

A Standard Form for PDA P

We transform PDA P into a form that makes it easier to create the equivalent grammarG

Conditions for the transformed machine:

It has a single accept state

It empties stack before accepting

Every transition is either a push or a pop of a single element

Transitions that replace the top are split into two transitions: pop and a push (this requires a new state)

Transitions that do not change the stack are split into two transitions: a push and pop of an arbitrary symbol (this requires a new state)

Clearly these changes do not affect the language accepted by the machine

A_pq and the Strings it Generates

A_pq must generate all strings that take P in state p with an empty stack to state q with an empty stack.

Let x be a string that is generated by A_pq. the computation of P on x must begin by pushing some character and end by popping a character.

There are two possibilities for the pop at the last step of the computation on x:

Either the pop at the last step pops the character that was pushed in the first step
Or, the pop at the last step pops a character that is different from the character that was pushed in the first step

In the first case (ie pops the same character) the stack is only empty at the beginning and end of the computation, but not in the middle of the computation

This is simulated with the rule A_pq → aA_rsb, where input symbol a begins x and takes P from p to r, and input symbol b is the last symbol of x and it takes P from s to q.

In the second case, the stack is empty at the beginning and end of the computation as well as at least one point in the middle.

This is simulated with the rule A_pq → A_prA_rq where takes P becomes empty in state r

PDA → CFG: Building the Grammar

All rules in G are of one of 3 forms:

A_pq → aA_rsb
A_pq → A_prA_rq
A_pp → ε

Rules in the first form are created for all pairs of transitions of P that push and pop the same symbol

that is, for all sets of states and transitions of this form:

1--(a, ε→t)-> 2 -- ... -> 3 --(b, t→ε)-> 4

Create a production of this form:

V₁₄ → aV₂₃b

Rules of the second form are created for ALL states p, q, and r

Rules of the third form are created for ALL states p

The start variable is V_{start accept}

Grammar Rules and Computations

Rules of form 2 correspond to computations that have an empty stack at state r

Rules of form 1 correspond to computations that start by consuming input and putting something (t) onto the stack and end by consuming input and removing that symbol

See Figures 2.28 and 2.29

For a given computation, the corresponding derivation could

start by replacing variables with pairs of variables to get a string of variables that represents all the states in the computation at which the stack is empty

then replacing these variables by (terminal, variable, terminal) strings that represent the strings accepted as the machine pushes and pops symbols

PDA → CFG: Examples 1 and 2

aabb, with push, pop, push, pop
aabb, with push, push, pop, pop

PDA → CFG: Example 3

Machine for {a^mbⁿ | m,n odd, 0<n≤m}

Transitions and stack for aaabbb

V₁₃ ⇒ aV₂₄b
⇒ aaV₁₃bb
⇒ aaaV₂₂bbb
⇒ aaabbb

Grammar (partial):

V₁₃ → aV₂₄b
V₁₃ → aV₂₂b
V₁₃ → ...

V₁₄ → aV₂₃b
V₁₄ → ...

V₂₃ → aV₁₂b
V₂₃ → ...

V₂₄ → aV₁₃b
V₂₄ → ...

V₁₁ → V₁₂V₂₁
V₁₁ → V₁₃V₃₁
V₁₁ → ...
V₁₂ → ...
V₂₂ → ...
...
V₁₁ → ε
V₂₂ → ε
...

Proof

By induction on number of steps in derivation and computation
See Claims 2.30 and 2.31

Regular Languages are Context Free Languages

Why???