QuickSelect

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

Here is the algorithm.

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

T(n) = Theta(n) + T(n-1) = Theta(n²).

But if the choices are indeed random, the expected running time is given by

T(n) <= Theta(n) + (1/n) ∑_{i=1 to n}T(max(i, n-i-1)),

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

Let's guess that T(n) <= an for some a. Then we get

T(n)
<= cn + (1/n) ∑_{i=1 to n}T(max(i-1, n-i))
= cn + (1/n) ∑_{i=1 to floor(n/2)} T(n-i) + (1/n) ∑_{i=floor(n/2)+1 to n} T(i)
<= cn + 2 (1/n) ∑_{i=floor(n/2) to n} T(i)
<= cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai,

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑_{i=n/2 to n} an, we get roughly 2(1/n)(n/2)an = an. But this is too big---there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

∑_{i=floor(n/2) to n} i = ∑_{i=1 to n} i - ∑_{i=1 to floor(n/2)} i = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2 <= n²/2 - (n/4)²/2 = (15/32)n²,

where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with

cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai,
<= cn + (2a/n) (15/32) n² = n (c + (15/16)a)
<= an,

provided a > 16c.

This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).

CategoryAlgorithmNotes