DivideAndConquer yields algorithms whose execution has a tree structure. Sometimes we build data structures that are also trees. It is probably not surprising that DivideAndConquer is the natural way to build algorithms that use such trees as inputs.
1. Tree basics
Here is a typical complete binary tree. It is binary because every node has at most two children. It is complete because the nodes consist only of internal nodes with exactly two children and leaves with no children.
0 / \ 1 2 / \ 3 4 / \ 5 6 / \ 7 8
Structurally, a complete binary tree consists of either a single node (a leaf) or a root node with a left and right subtree, each of which is itself either a leaf or a root node with two subtrees. The set of all nodes underneath a particular node x is called the subtree rooted at x.
The size of a tree is the number of nodes; a leaf by itself has size 1. The height of a tree is the length of the longest path; 0 for a leaf, at least one in any larger tree. The depth of a node is the length of the path from the root to that node. The height of a node is the height of the subtree of which it is the root, i.e. the length of the longest path from that node to some leaf below it. A node u is an ancestor of a node v if v is contained in the subtree rooted at u; we may write equivalently that v is a descendant of u. Note that every node is both and ancestor and descendant of itself; if we wish to exclude the node itself, we refer to a proper ancestor or proper descendant.
2. Binary tree implementations
In a low-level programming language like C, a binary tree typically looks a lot like a linked list with an extra outgoing pointer from each element, e.g.
Missing children (and the empty tree) are represented by null pointers. Typically, individual tree nodes are allocated separately using malloc; however, for high-performance use it is not unusual for tree libraries to do their own storage allocation out of large blocks obtained from malloc.
Optionally, the struct may be extended to include additional information such as a pointer to the node's parent, hints for balancing (see BalancedTrees), or aggregate information about the subtree rooted at the node such as its size or the sum/max/average of the keys of its nodes.
When it is not important to be able to move large subtrees around simply by adjusting pointers, a tree may be represented implicitly by packing it into an array. For an example of how this works see Heaps.
3. The canonical binary tree algorithm
Pretty much every DivideAndConquer algorithm for binary trees looks like this:
The function processes all nodes in what is called a preorder traversal, where the "preorder" part means that the root of any tree is processed first. Moving the call to doSomethingTo in between or after the two recursive calls yields an inorder or postorder traversal, respectively.
In practice we usually want to extract some information from the tree. For example, this function computes the size of a tree:
and this function computes the height:
Since both of these algorithms have the same structure, they both have the same asymptotic running time. We can compute this running time using the recurrence
- T(n) = Theta(1) + T(k) + T(n-k-1)
where k is the size of the left subtree.
Now, there's a problem with this recurrence: for an arbitrary tree of size k, we don't know what k is! So how can we solve a recurrence that contains an unbound variable?
The trick is that in this case we get the same answer no matter what k is. First let's show that T(n) <= an for some a:
T(n) <= c + T(k) + T(n-k-1) <= c + ak + a(n-k-1) = c + a(n-1) <= an [provided c <= a].
Showing that it is greater than an (presumably for a different a), is essentially the same argument, now with c >= a:
T(n) >= c + T(k) + T(n-k-1) >= c + ak + a(n-k-1) = c + a(n-1) >= an [provided c >= a].
So these are all Theta(n) algorithms.
4. Nodes vs leaves
For some binary trees we don't store anything interesting in the internal nodes, using them only to provide a route to the leaves. We might reasonably ask if an algorithm that runs in O(n) time where n is the total number of nodes still runs in O(m) time, where m counts only the leaves. For complete binary trees, we can show that we get the same asymptotic performance whether we count leaves only, internal nodes only, or both leaves and internal nodes.
Let T(n) be the number of internal nodes in a complete binary tree with n leaves. It is easy to see that T(1) = 0 and T(2) = 1, but for larger trees there are multiple structures and so it makes sense to write a recurrence:
- T(n) = 1 + T(k) + T(n-k).
We will show by induction that the solution to this recurrence is exactly T(n) = n-1. We already have the base case T(1) = 0. For larger n, we have
- T(n) = 1 + T(k) + T(n-k) = 1 + (k-1) + (n-k-1) = n-1.
So a tree with Theta(n) nodes has Theta(n) internal nodes and Theta(n) leaves; if we don't care about constant factors, we won't care which number we use.
5. Special classes of binary trees
So far we haven't specified where particular nodes are placed in the binary tree. Most applicaitons of binary trees put some constraints on how nodes relate to one another. Some possibilities:
BinarySearchTrees: Each node has a key, and a node's key must be greater than all keys in the subtree of its left-hand child and less than all keys in the subtree of its right-hand child.
Heaps: Each node has a key that is less than the keys of both of its children.