# Click this box to toggle showing all answers!

### Longest Common Subsequence (LCS)

• Problem:
• Sequence: $X = [ x_1, x_2, ..., x_m ]$
• Sequence: $Y = [ y_1, y_2, ..., y_m ]$
• Find longest subsequence that is common to both
• Subsequence elements do not need to be adjacent

• Examples:
• breath / conservative = eat

• springtime / pioneer = pine

• horseback / snowflake = oak

• maelstrom / becalm = elm

• heroically / scholarly = holly

### Brute Force LCS Algorithm

• Brute force algorithm:

For every subsequence of $X$, see if it is a subsequence of Y

• How many subsequences of $X$?

$2^n$. Why?

• How long to check each subsequences of $X$?

$\Theta(n)$. Why?

• Scan Y for first letter of subsequence, then second, ...

• Total time?

$\Theta(n2^n)$.

Click this box to toggle showing all answers!

### LCS - DP Table(s)

• Example table(s) for BREATHER and CONSERVATIVES:

• Stare at the table a while - what do you notice
• Make up your mind: Is it "table" or "tables"

This one table shows two arrays

• Inconceivable!

No, easily conceived! One array stores the optimum values, the other stores information needed to reconstruct a solution that gives that optimum value. Both are shown below in one grid.

• That's beginning to make sense - tell me more ...

The numbers are the length of the best subsequence at that point, and the arrows show which which neighbor was best

• Hmmm, what do you mean "at that point" ...?

Ponder the next question ...

• What did you say was the most important thing to know about a table?

To be crystal clear on what the table elements represent

• Is that all?

To know how to find a cell's value from other cells

• Is that really all?

To know how the order in which to build the table

• Okay, I'll bite; what are those answers for this example?

Cell(i,j) is the LCS for words (1..i) on the left and (1 .. j) on the top.

• By row, by column, either? Diagonal?

Either by row or by column.

• Okay I believe you, but how was I ever supposed to know that?

What cells are needed? West, North, Northwest. You can find those three whether you do by row or by column.

• But not by diagonal?

Well, one of diagonals would work, but by row or column is much simpler.

• Should I think about what it means to fill in by row?

You bet! Great idea.

• Is this too many questions?

You bet!

• $\begin{array}{cc|ccccccccccccc} & & & C & O & N & S & E & \color{red}{R} & V & \color{green}{A} & \color{blue}{T} & I & V & \color{orange}{E} & S\\ \hline & i, j & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13\\ \hline &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ B &1 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \color{red}{\uparrow 0} & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 \\ \color{red}{ R}&2 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \color{red}{\nwarrow 1} & \color{red}{\leftarrow 1} & \leftarrow 1 & \leftarrow 1 & \leftarrow 1 & \leftarrow 1 & \leftarrow 1 & \leftarrow 1 \\ E &3 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \nwarrow 1 & \uparrow 1 & \color{red}{\uparrow 1} & \uparrow 1 & \uparrow 1 & \uparrow 1 & \uparrow 1 & \nwarrow 2 & \leftarrow 2 \\ \color{green}{ A}&4 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 1 & \uparrow 1 & \uparrow 1 & \color{red}{\nwarrow 2} & \leftarrow 2 & \leftarrow 2 & \leftarrow 2 & \uparrow 2 & \uparrow 2 \\ \color{blue}{ T}&5 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 1 & \uparrow 1 & \uparrow 1 & \uparrow 2 & \color{red}{\nwarrow 3} & \color{red}{\leftarrow 3} & \color{red}{\leftarrow 3} & \leftarrow 3 & \leftarrow 3\\ H &6 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 1 & \uparrow 1 & \uparrow 1 & \uparrow 2 & \uparrow 3 & \uparrow 3 & \color{red}{\uparrow 3} & \uparrow 3 & \uparrow 3\\ \color{orange}{ E}&7 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 1 & \uparrow 1 & \uparrow 1 & \uparrow 2 & \uparrow 3 & \uparrow 3 & \uparrow 3 & \color{red}{\nwarrow 4} & \leftarrow 4 \\ R &8 & 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 0 & \uparrow 1 & \nwarrow 2 & \leftarrow 2 & \uparrow 2 & \uparrow 3 & \uparrow 3 & \uparrow 3 & \color{red}{\uparrow 4} & \color{red}{\leftarrow 4}\\ \end{array}$

• Example: c(5, 3) := 1, from the North:
• c(5, 3) represents the cost of LCS of X(1..5) and Y(1..3)
• X(1..5) = BREAT
• Y(1..3) = CON
• Since X(5) = T ≠ Y(3) = N and c(4,3) = c(5,2) = 0:
• Then c(5, 3) := c(4,3) = 1
• This is the value from the North
• In other words: LCS(BREAT, CON) := LCS(BREA, CON) = 0

• Example: c(5, 9) := 3, from the NorthWest:
• c(5, 9) represents cost of LCS of X(1..5) and Y(1..9)
• X(1..5) = BREAT
• Y(1..9) = CONSERVAT
• Since X(5) = T = Y(9):
• Then c(5, 9) := c(4,8) + 1 = 1 + 2 = 3
• This is the value from the NorthWest
• In other words: LCS(BREAT, CONSERVAT) := LCS(BREA, CONSERVA) + 1

• Example: c(5, 12) := 3, from the West:
• c(5, 12) represents cost of LCS of X(1..5) and Y(1..12)
• X(1..5) = BREAT
• Y(1..12) = CONSERVATIVE
• Since X(5) = T ≠ Y(12) = E and c(4,12) = 2 and c(5,11) = 3:
• Then c(5, 12) := c(5,11) = 3
• This is the value from the West
• In other words: LCS(BREAT, CONSERVATIVE) := LCS(BREAT, CONSERVATIV)

• What happens to ties

In the table, go North

• Could they be handled differently?

Sure, Go West, young CS Student

• Try it!

### LCS - DP Algorithm

• This solution fills two tables:
• c(i, j) = length of longest common subsequence of X(1..i) and Y(1..j)
• b(i, j) = direction (either N, W, or NW) from which value of c(i,j) was obtained

• Length of LCS for X(1..m) and Y(1..n) is in c(m, n)

• LCS-Length(X, Y)
•     m, n := X.length, Y.length
b(1..m, 1..n)
c(0..m, 0..n) := (others => (others => 0))

for i in 1 .. m loop
for j in 1 .. n loop
if xi = yj
c(i, j) := c(i-1, j-1) + 1
b(i, j) := "NW"
else
if c(i-1, j) ≥ c(i, j-1) then
c(i, j) := c(i-1, j)
b(i, j) := "N"
else
c(i, j) := c(i, j-1)
b(i, j) := "W"
end if
end if
end loop
end loop 

• What's the time complexity?

• You're right! I do! It's ...

It's $\Theta(m, n)$!

### LCS - Printing DP Solution

• Print-LCS(b, X, i, j)
•     if i >  0 and j >  0 then
if b(i, j) = "NW" then
print-LCS(b, X, i-1, j-1)
print xi
elsif b(i, j) = "N" then
print-LCS(b, X, i-1, j)
else
print-LCS(b, X, i, j-1)
end if
end if

• Initial call is Print-LCS(b, X, m, n)

• Notice that recursive calls are made until the base case is reached, and then values are printed after returning from the recursion

• b(i, j) points to the table entry whose subproblem was used in solving LCS of Xi and Yj

• When b(i,j) = "NW", we have extended LCS by one character, so LCS is made up of entries with "NW"

• What's the time complexity?

• You're right! I do! It's ...

It's the same: $\Theta(m, n)$!

### LCS - Recursive Solution

• Notation:
• $X_i = [ x_1, x_2, \dots, x_i ]$
• $Y_i = [ y_1, y_2, \dots, y_i ]$

• Need to find a subproblem whose solution can be used to find solution to given problem

• Define: $c(i,j) = \textrm{length of LCS of } X_i \text{ and } Y_j$

• Recursive definition: $c(i,j) = \begin{cases} 0, & \text{ if } i=0 \text{ or } j=0 \\ c(i-1, j-1) + 1, & \text{ if } i, j > 0 \text{ and } x_i = y_j \\ \max(c(i-1, j), c(i, j-1), & \text{ if } i, j > 0 \text{ and } x_i ≠ y_j \end{cases}$

• We must find $c(m, n)$
• Could we write a routine to find it?

• What's the time complexity?

Do you know the answer ...?

• Hmmm ... ?!?

Close enough - whatever it is, it's slow!

• What does a recursion tree look like for m=4, n=3
• Notice the repeated subproblems

Click this box to toggle showing all answers!