AugmentedDataStructures

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

An augmented data structure is a data structure that has been annotated with additional information that permits new operations to be performed quickly. After picking the base data structure, there are two steps involved in augmenting it in this way:

Determine what additional data needs to be stored to solve the problem.
Determine how to keep this data current as the data structure is updated.

This second step often involved defining an invariant that guarantees that the correct information is always present.

To take a very simple example, suppose that we want to simulate an array with indices 1..N, where N is a very large number, under the assumption that all but n of the locations in the array will have value 0. The operations we want to perform on the array are

Set(A, i, x): set A[i] = x,
Get(A, i): return A[i], and
Count(A): return the number of nonzero entries in A.

We would like to use O(n) space to store the array and have each operation take (expected, amortized) time O(1).

The Set and Get operations are easily handled using a HashTable. For the Count operation, we attach to this hash table somewhere a count field that keeps track of how many entries are nonzero. It is our responsibility to update this count field as necessary. Pseudocode for the Set, Get, and Count operations might look something like this:

Get(A, i):
  x = HashGet(A.hashtable, i)
  if x is null:
    return 0
  else:
    return x

Set(A, i, x):
  oldvalue = HashGet(A.hashtable, i)
  if oldvalue is not null and oldvalue != 0:
    A.count = A.count - 1
  HashSet(A.hashtable, i, x)
  if x != 0:
    A.count = A.count + 1

Count(A):
  return A.count

The invariant for this data structure is that A.count always contains the number of nonzero values in A.hashtable. The extra code in Set is used to preserve this invariant, and Count is correct precisely because the invariant holds.

More sophisticated augementations typically scatter statistical information throughout the data structure. This is most commonly done with tree data structures, where each node stores summary information about it subtrees; the invariant in these data structures is that this summary information is correct, and the main cost of maintaining the invariant is updating the summary information in response to insertions, deletions, and rebalancing. For examples of this approach, see AggregateQueries and OrderStatisticsTree.

CategoryAlgorithmNotes