# Dynamic Programming - Optimal Binary Search Trees

### Optimal Binary Search Trees - Problem

• Problem:
• Sorted set of keys $k_1, k_2, ..., k_n$

• Key probabilities: $p_1, p_2, ..., p_n$

• What tree structure has lowest expected cost?

• Cost of searching for node $i$: $\text{cost}(k_i) = \text{depth}(k_i) + 1$

• \displaystyle \begin{align*} \text{Expected Cost of tree } & = \sum_{i=1}^n \text{cost}(k_i)p_i \\ & = \sum_{i=1}^n (\text{depth}(k_i) + 1) p_i \\ & = \sum_{i=1}^n \text{depth}(k_i) p_i + \sum_{i=1}^n p_i \\ & = \left(\sum_{i=1}^n \text{depth}(k_i) p_i\right) + 1 \end{align*}

### Optimal BST - Example

• Example:
• Probability table ($p_i$ is the probabilty of key $k_i$:
•  $i$ 1 2 3 4 5 $k_i$ $k_1$ $k_2$ $k_3$ $k_4$ $k_5$ $p_i$ 0.25 0.20 0.05 0.20 0.30

Two BSTs
• Given: $k_1 < k_2 < k_3 < k_4 < k_5$

• Tree 1:
• $k_2 / [k_1, k_4] / [nil, nil], [k_3, k_5]$
• cost = 0(0.20) + 1(0.25+0.20) +2(0.05+0.30) + 1 = 1.15 + 1

• Tree 2:
• $k_2 / [k_1, k_5] / [nil, nil], [k_4, nil] / [nil,nil],[nil,nil], [k_3, nil], [nil,nil]$
• cost = 0(0.20) + 1(0.25+0.30) +2(0.20) + 3(0.05) + 1 = 1.10 + 1

• Notice that a deeper tree has expected lower cost

### Optimal BST - DP Approach

• Optimal BST $T$ must have subtree $T'$ for keys $k_i \dots k_j$ which is optimal for those keys
• Cut and paste proof: if $T'$ not optimal, improving it will improve $T$, a contradiction

• Algorithm for finding optimal tree for sorted, distinct keys $k_i \dots k_j$:
• For each possible root $k_r$ for $i ≤ r ≤ j$
• Make optimal subtree for $k_i, \dots, k_{r-1}$
• Make optimal subtree for $k_r+1, \dots, k_j$
• Select root that gives best total tree

• Formula: $e(i, j)$ = expected number of comparisons for optimal tree for keys $k_i \dots k_j$

• $\displaystyle e(i, j) = \begin{cases} 0, \text{ if } i = j + 1 \\ \min_{i ≤ r ≤ j} \{e(i, r-1) + e(r+1, j) + w(i,j)\}, \text{ if } i ≤ j \end{cases}$

• where $\displaystyle w(i, j) = \sum_{k=i}^j p_i$ is the increase in cost if $k_i\dots k_j$ is a subtree of a node

• Work bottom up and remember solution

### Optimal BST - Algorithm and Performance

• Brute Force: try all tree configurations
• Ω(4n / n3/2) different BSTs with n nodes

• DP: bottom up with table: for all possible contiguous sequences of keys and all possible roots, compute optimal subtrees
•     for size in 1 .. n loop             -- All sizes of sequences
for i in 1 .. n-size+1 loop     -- All starting points of sequences
j := i + size - 1
e(i, j) := float'max;
for r in i .. j loop        -- All roots of sequence ki .. kj
t := e(i, r-1) + e(r+1, j) + w(i, j)
if t < e(i, j)  then
e(i, j) := t
root(i, j) := r
end if
end loop
end loop
end loop

• Θ(n3)

• Can, of course, also use (memoized) recursion