Quick Sort
Quicksort Overview
- Worst case running time: Θ(n2)
- Expected running time: Θ(n lg n)
- Small constants in Θ(n lg n)
- Sorts in place
Quicksort Algorithm
Quicksort(A, p, 4)
if p < r
q = Partition(A, p, r)
Quicksort(A, p, q-1)
Quicksort(A, q+1, r)
Partition(A, p, r)
x = A[r]
i = p - 1 -- rightmost ≤ pivot
for j in p .. r - 1 loop
if a[j] ≤ x
i = i + 1
swap a[i] and a[j]
swap a[i + 1] and a[r]
return i + 1
QuickSort Example
ij
j \p r
1 /8 1 6 4 0 3 9|5
ip j r
2 /8\1 6 4 0 3 9|5
swap
i
p j r
3 1/8\6 4 0 3 9|5
i
p j r
4 1/8 6\4 0 3 9|5
swap
p i j r
5 1 4/6 8\0 3 9|5
swap
p i j r
6 1 4 0/8 6\3 9|5
swap
p i j r
7 1 4 0 3/6 8\9|5
swap
p i r
1 4 0 3/6 8 9|5
swap
p i r
1 4 0 3/5 8 9 6|
≤ / >
> \ unchecked
Notice how few swaps
Notice how each element is close to its final location
Partition Regions/Loop Invariant
- Pivot is A[r]
- Four Regions:
- A[p .. i] are ≤ pivot
- A[i+1 .. j-1] are > pivot
- A[r] is the pivot
-
[j .. r-1] not yet examined
- Properties of Regions 1..3 are the Loop Invariant
Partition Correctness
- Initialization: Before loop starts, r is pivot, and A[p..1] and
A[i+1..j-1] are empty
- Maintenance: For each pass, if A[j] ≤ pivot, then A[j] and
A[i+1] are swapped, and i and j are incremented. If A[j] >
pivot, then only j is incremented. In both cases, invariant conditions
are maintained
- Termination: When loop terminates, j = r, and so all elements have
been tested against pivot, and so all elements in A[p..i] are less or
equal the pivot,
A[i+1..r-1] are greater than the pivot, and A[r] is the pivot.
Partition Time
- Time for partition is Θ(n)
Quicksort Performance
- Performance depends on partitioning
- If subarrays are balanced, quicksort is as fast as mergesort and heapsort (ie
O(n lg n))
- If subarrays are not balanced, quicksort is as slow as insertion sort worst
case (ie Ω(n^2))
- Expected case: good: n lg n
- Algorithm can be tuned to avoid worst case
- Low constants = good behavior
Quicksort Worst Case
- Intuitively, occurs when subarrays are completely unbalanced
- Unbalanced means 0 elements in one subarray, and n-1 elements in the other
- Recurrence: T(n)
- = T(n-1) + T(0) + Θ(n)
- = T(n-1) + Θ(n)
- = Θ(n2) [by substutition]
- This is insertion worst and expected case
- What is the worst case for quicksort:
- What does insertion sort do in this case?
- Careful: Have we proved that this is the worst case?
- What have we proved??
- What can we say about the worst case?
Quicksort Best Case
- Intuitively, occurs when subarrays are balanced each partition
- Each subarray has ≤ n/2 elements
- Recurrence: T(n)
- = T(n/2) + Θ(n)
- = Θ(n lg n) [how do we know?]
- Have we proved that best case is Θ(n lg n)? We will return to this.
Average Case? Effects of Partitions
- Let's gain some intution on effects of partition sizes
- Assume 9 to 1 split each time [See recursion tree on p. 176]
- T(n) ≤ t(9n/10) + T(n/10) + Θ(n) = O(n lg n)
- Analysis: similar to T(n) = T(n/3) + T(2n/3) + O(n) in Chapter 4
- log10 full levels and log10/9 non-full levels
- Base of log does not matter
- Any split of constant proportionallity will give a recursion tree of depth
Θ(lg n)
Thinking about Average Case
- Average depends on distribution of input data sets
- What really matters: relative ordering
- Assume all orderings are equally likely
- Result: good and bad splits are spread through the tree
- What happens if they alternate? Figure 7.5, p. 177.
- Θ(n) to partition and 2T((n-1)/2) remaining
- Average case for randomly distributed elements is Θ(n lg n)
Random Partitioning
- But, input may not be randomly distributed
- One technique for making it appear to be random: randomly select pivot
- Exchange random element with element A[r], and proceed as before
Formal Worst Case Analysis
- Earlier we proved time for completely unbalanced split is
Θ(n^2)
- Now we prove that this is indeed the worst case of the algorithm
- We use substitution to show that T(n) for quicksort is O(n^2)
- T(n) = max[q in 0 .. n-1] (T(q) + T(n-q-1)) + Θ(n)
- ≤ max[q in 0 .. n-1] (cq2 +
c(n-q-1)2) + Θ(n)
- = c * max[q in 0 .. n-1] (q2 +
(n-q-1)2) + Θ(n)
- ≤ c * (n2 - 2n + 1) + Θ(n) [see below]
- = cn2 - c(2n - 1) + Θ(n)
- ≤ cn2, for large enough c
- In the above we use
max[q in 0 .. n-1] (q2 + (n-q-1)2 ≤ (n-1)^2
= n^2 - 2n + 1,
since the max of the expression is at the endpoints of the range
- Thus, for the worst case, T(n) = O(n^2)
- Earlier we found a case for which T(n) = Θ(n^2), and so for this case
T(n) = Ω(n^2)
- Thus, for the worst case, T(n) = O(n^2) and T(n) = Ω(n^2), and so we
conclude for the worst case, T(n) = Θ(n^2)
Formal Best Case
- All even partitions already shown to be Θ(n lg n)
- Thus, best case is _(n lg n) [Ω or O?]
- Can we do better?
- No, it can be proved that ANY comparison sort requires Ω(n lg n)
comparisons in the worst case [See Section 8.1]
- We should be able to prove the best case performance using an
analysis like the one above for worst case.
Behavior of Randomized of Quicksort
- Our goal is to find an upper bound for the runtime of quicksort
- Quicksort's runtime: dominated by total cost of all calls to partition
- What about the recursive calls? How many are there?
- Number of calls to partition = O(??)
- Now, how much work in each call? Hard to answer ...
- How much work in all calls? Also hard to answer
- But, we can count the number of comparisons
- What the number of swaps? How is it related to the number of comparisons
- Let X = total number of comparisons in all calls to partition
- Total work in all calls to partition = O(n + X)
- How do we count the total number of comparisons? Probabilistic analysis
Number of Comparisons in Random Quicksort
- Let X = total number of comparisons in all calls to partition
- Rename array elements z1, z2, ..., zn
- Let zij be the slice of A from i to j (ie zi to zj)
- Each pair of elements is compaired 0 or 1 time - Why?
- Let Xij = 1 if zi and zj are compared, 0 otherwise
- Thus, X = double sum over all pairs of Xij [with i ≤ j]
- Now consider the expected value of X:
- E(X) = E(double sum Xij)
- = double sum of E(Xij)
- = double sum of probability (Xi and Xj are compared)
- = double sum of probability (Xi is first pivot or Xj is first pivot)
- = double sum of probability 2*(Xi is first pivot)
- = double sum of probability 2*(1/(j-i+1)
- < double sum of 2/k [change of variables in summation]
- = single sum O(lg n)
- = O(n lg n)
- So the expected running time of randomized quicksort is O(n lg n)
- So the expected running time of randomized quicksort is Θ(n lg n)
Sorting Comparison
- Time to sort 1000 elements (Knuth)
- 16: Heap
- 13: Shell
- 8: Quick
Avoiding the worst case
- Count the number of times a subarray is partitioned, and if > c, use
heapsort
Tuning Quicksort
- Here are some ways to make quicksort faster:
- Use a fast swap
- Small subarray: use insertion sort
- Move elements equal to the pivot to ends
- 4 regions: equal left, less, greater, equal right
- Move all equal elements to the middle
- All equal elements is a problem case of quicksort
- Select partition method based on subarray size:
- > 40: median of median of 3 (9 elements, 12 comparisons)
- ≤ 7: middle element
- others: medium of 3
- From Bentley and McIlroy, 93: Engineering a Sort Function