[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

A median of a list of n values, where n is odd, is the element that would be at position (n+1)/2 if the list were sorted. When n is even, both the elements at positions n/2 and n/2+1 are medians.

Finding a median can be done in Theta(n log n) time by sorting. It can be done in expected Theta(n) time using the randomized algorithm QuickSelect. And it can be done in Theta(n) time deterministically using a rather complicated divide-and-conquer algorithm due to Blum et al (JCSS 7(4):448-461, 1973).

Here's the idea: suppose we want to find the k-th smallest of n elements. Suppose further that we can split the n elements into two piles of size m and n-m, where the largest element in the first pile is less than the smallest element in the second pile. Then if k <= m, the desired element is just the k-th smallest element in the first pile; but if k > m, then it's the (k-m)-th smallest element in the second pile. To split the piles, we pick some pivot element and throw everything less than the pivot into the small pile and everything bigger than the pivot in the big pile; this takes Theta(n) time.

But how do we pick the pivot? If we just take the first element (say), we'll probably find out that our good friend the adversary has put the smallest element there, giving us a pile with one element and a pile with n-1, where he will also be hiding the target. T(n) = T(n-1) + Theta(n) = Theta(n2) is not a very good running time for something we can do easily in O(n log n) time. Can we get a better pivot?

Here's where the Blum et al algorithm comes in. Their algorithm partitions the elements into groups of five (with up to four left over, which we ignore). It then computes the median of each group (in Theta(n) time, since it takes Theta(1) time to compute the median of a constant-size group and there are floor(n/5) = Theta(n) groups). Finally, it takes these floor(n/5) group medians and computes the median of medians by applying the algorithm recursively, in time T(n/5).

How many elements can be smaller than this median of medians? It is greater than the medians of roughly n/10 groups, each of which is greater than or equal to 3 other elements. So 3n/10 elements at most (give or take a small additive constant) will be less than the median of medians. By symmetry roughly 3n/10 will be greater, meaning that the size of the larger pile after splitting around the median-of-medians is at most 7n/10.

We now have a recurrence T(n) <= Theta(n) + T(n/5) + T(7n/10). We can reasonably guess that this is no greater than Theta(n) (it can't be less). So let's verify this guess; under the assumption that T(n') <= an' for n' < n,

provided c < a/10, which we can easily arrange. So the median-of-median algorithm (MOMSelect?) finds k-th smallest element in time O(n).

An easier solution is to pick a pivot randomly. This gives the QuickSelect algorithm, discussed on its own page.


CategoryAlgorithmNotes


2014-06-17 11:58