An augmented data structure is a data structure that has been annotated with additional information that permits new operations to be performed quickly. After picking the base data structure, there are two steps involved in augmenting it in this way:
- Determine what additional data needs to be stored to solve the problem.
- Determine how to keep this data current as the data structure is updated.
This second step often involved defining an invariant that guarantees that the correct information is always present.
To take a very simple example, suppose that we want to simulate an array with indices 1..N, where N is a very large number, under the assumption that all but n of the locations in the array will have value 0. The operations we want to perform on the array are
- Set(A, i, x): set A[i] = x,
- Get(A, i): return A[i], and
- Count(A): return the number of nonzero entries in A.
We would like to use O(n) space to store the array and have each operation take (expected, amortized) time O(1).
The Set and Get operations are easily handled using a HashTable. For the Count operation, we attach to this hash table somewhere a count field that keeps track of how many entries are nonzero. It is our responsibility to update this count field as necessary. Pseudocode for the Set, Get, and Count operations might look something like this:
Get(A, i): x = HashGet(A.hashtable, i) if x is null: return 0 else: return x Set(A, i, x): oldvalue = HashGet(A.hashtable, i) if oldvalue is not null and oldvalue != 0: A.count = A.count - 1 HashSet(A.hashtable, i, x) if x != 0: A.count = A.count + 1 Count(A): return A.count
The invariant for this data structure is that A.count always contains the number of nonzero values in A.hashtable. The extra code in Set is used to preserve this invariant, and Count is correct precisely because the invariant holds.
More sophisticated augementations typically scatter statistical information throughout the data structure. This is most commonly done with tree data structures, where each node stores summary information about it subtrees; the invariant in these data structures is that this summary information is correct, and the main cost of maintaining the invariant is updating the summary information in response to insertions, deletions, and rebalancing. For examples of this approach, see AggregateQueries and OrderStatisticsTree.