[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

A heap is a binary tree data structure (see BinaryTrees) in which each element has a key (or sometimes priority) that is less than the keys of its children. Heaps are used to implement the priority queue abstract data type (see AbstractDataTypes), which we'll talk about first.

1. Priority queues

In a standard queue, elements leave the queue in the same order as they arrive. In a priority queue, elements leave the queue in order of decreasing priority: the DEQUEUE operation becomes a DELETE-MIN operation (or DELETE-MAX, if higher numbers mean higher priority), which removes and returns the highest-priority element of the priority queue, regardless of when it was inserted. Priority queues are often used in operating system schedulers to determine which job to run next: a high-priority job (e.g., turn on the fire suppression system) runs before a low-priority job (floss the cat) even if the low-priority job has been waiting longer.

2. Expensive implementations of priority queues

Implementing a priority queue using an array or linked list is likely to be expensive. If the array or list is unsorted, it takes O(n) time to find the minimum element; if it is sorted, it takes O(n) time (in the worst case) to add a new element. So such implementations are only useful when the numbers of INSERT and DELETE-MIN operations are very different. For example, if DELETE-MIN is called only rarely but INSERT is called often, it may actually be cheapest to implement a priority queue as an unsorted linked list with O(1) INSERTs and O(n) DELETE-MINs. But if we expect that every element that is inserted is eventually removed, we want something for which both INSERT and DELETE-MIN are cheap operations.

3. Heaps

A heap is a binary tree in which each node has a smaller key than its children; this property is called the heap property or heap invariant.

To insert a node in the heap, we add it as a new leaf, which may violate the heap property if the new node has a lower key than its parent. But we can restore the heap property (at least between this node and its parent) by swapping either the new node or its sibling with the parent, where in either case we move up the node with the smaller key. This may still leave a violation of the heap property one level up in the tree, but by continuing to swap small nodes with their parents we eventually reach the top and have a heap again. The time to complete this operation is proportional to the depth of the heap, which will typically be O(log n) (we will see how to enforce this in a moment).

To implement DELETE-MIN, we can easily find the value to return at the top of the heap. Unfortunately, removing it leaves a vacuum that must be filled in by some other element. The easiest way to do this is to grab a leaf (which probably has a very high key), and then float it down to where it belongs by swapping it with its smaller child at each iteration. After time proportional to the depth (again O(log n) if we are doing things right), the heap invariant is restored.

Similar local swapping can be used to restore the heap invariant if the priority of some element in the middle changes; we will not discuss this in detail.

4. Packed heaps

It is possible to build a heap using structs and pointers, where each element points to its parent and children. In practice, heaps are instead stored in arrays, with an implicit pointer structure determined by array indices. For zero-based arrays as in C, the rule is that a node at position i has children at positions 2*i+1 and 2*i+2; in the other direction, a node at position i has a parent at position (i-1)/2 (which rounds down in C). This is equivalent to storing a heap in an array by reading through the tree in BreadthFirstSearch order:

   0
  / \
 1   2
/ \ / \
3 4 5 6

becomes

0 1 2 3 4 5 6

This approach works best if there are no gaps in the array. So to maximize efficiency we make this "no gaps" policy part of the invariant. We can do so because we don't care which leaf gets added when we do an INSERT, so we can make it be the position at the end of the array. Similarly, in a DELETE-MIN operation we can promote the last element to the root before floating it down. Of course, the usual implementation considerations with variable-length arrays apply (see C/DynamicStorageAllocation).

5. Bottom-up heapification

If we are presented with an unsorted array, we can turn it into a heap more quickly than the O(n log n) time required to do n INSERTs. The trick is to build the heap from the bottom up (i.e. starting with position n-1 and working back to position 0), so that when it comes time to fix the heap invariant at position i we have already fixed it at all later positions (this is a form of DynamicProgramming). Unfortunately, it is not quite enough simply to swap a[i] with its smaller child when we get there, because we could find that a[0] (say) was the largest element in the heap. But the cost of floating a[i] down to its proper place will be proportional to its own height rather than the height of the entire heap. Since most of the elements of the heap are close to the bottom, the total cost will turn out to be O(n).

The detailed analysis for a heap with exactly 2k-1 elements is that we pay 0 for the 2k-1 bottom elements, 1 for the 2k-2 elements one level up, 2 for the 2k-3 elements one level above this, and so forth, giving a total cost proportional to

\begin{displaymath}
\sum_{i=1}^{k} i 2^{k-i} = 2^k \sum_{i=1}^{k} \frac{i}{2^i} \le 2^k \sum_{i=1}^{\infty} \frac{i}{2^i} = 2^{k+1}.
\end{displaymath}

The last step depends on recognizing that the sum equals 2, which can be proved using GeneratingFunctions.

6. Heapsort

Bottom-up heapification is used in the Heapsort algorithm, which first does bottom-up heapification in O(n) time and then repeatedly calls DELETE-MIN to extract the next element. This is no faster than the O(n log n) cost of MergeSort or QuickSort in typical use, but it can be faster if we only want to get the first few elements of the array.

Here is a simple implementation of heapsort, that demonstrates how both bottom-up heapification and the DELETE-MIN procedure work by floating elements down to their proper places:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 #include <assert.h>
   4 
   5 /* max heap implementation */
   6 
   7 /* compute child 0 or 1 */
   8 #define Child(x, dir) (2*(x)+1+(dir))
   9 
  10 /* float value at position pos down */
  11 static void
  12 floatDown(int n, int *a, int pos)
  13 {
  14     int x;
  15 
  16     /* save original value once */
  17     x = a[pos];
  18 
  19     for(;;) {
  20         if(Child(pos, 1) < n && a[Child(pos, 1)] > a[Child(pos, 0)]) {
  21             /* maybe swap with Child(pos, 1) */
  22             if(a[Child(pos, 1)] > x) {
  23                 a[pos] = a[Child(pos, 1)];
  24                 pos = Child(pos, 1);
  25             } else {
  26                 /* x is bigger than both kids */
  27                 break;
  28             }
  29         } else if(Child(pos, 0) < n && a[Child(pos, 0)] > x) {
  30             /* swap with Child(pos, 0) */
  31             a[pos] = a[Child(pos, 0)];
  32             pos = Child(pos, 0);
  33         } else {
  34             /* done */
  35             break;
  36         }
  37     }
  38 
  39     a[pos] = x;
  40 }
  41 
  42 /* construct a heap bottom-up */
  43 static void
  44 heapify(int n, int *a)
  45 {
  46     int i;
  47 
  48     for(i = n-1; i >= 0; i--) {
  49         floatDown(n, a, i);
  50     }
  51 }
  52 
  53 /* sort an array */
  54 void
  55 heapSort(int n, int *a)
  56 {
  57     int i;
  58     int tmp;
  59 
  60     heapify(n, a);
  61 
  62     for(i = n-1; i > 0; i--) {
  63         /* swap max to a[i] */
  64         tmp = a[0];
  65         a[0] = a[i];
  66         a[i] = tmp;
  67 
  68         /* float new a[0] down */
  69         floatDown(i, a, 0);
  70     }
  71 }
  72 
  73 #define N (100)
  74 #define MULTIPLIER (17)
  75 
  76 int
  77 main(int argc, char **argv)
  78 {
  79     int a[N];
  80     int i;
  81 
  82     if(argc != 1) {
  83         fprintf(stderr, "Usage: %s\n", argv[0]);
  84         return 1;
  85     }
  86 
  87     for(i = 0; i < N; i++) { a[i] = (i*MULTIPLIER) % N; }
  88 
  89     for(i = 0; i < N; i++) { printf("%d ", a[i]); }
  90     putchar('\n');
  91 
  92     heapSort(N, a);
  93 
  94     for(i = 0; i < N; i++) { printf("%d ", a[i]); }
  95     putchar('\n');
  96 
  97     return 0;
  98 }
heapsort.c

7. More information


CategoryProgrammingNotes CategoryAlgorithmNotes


2014-06-17 11:58