The greedy method is an algorithmic technique that builds up a solution to a large problem incrementally by solving increasingly large subproblems. In this respect it is similar to DynamicProgramming, but unlike DynamicProgramming it commits to partial solutions immediately rather than keeping track of multiple partial solutions. When it works, the greedy method yields algorithms that are much simpler than dynamic programming algorithms. Most of the time it doesn't work, but we will describe below some situations where it does.
1. Greedy weighted subsets
Suppose that I have a set of objects S, each of which has a known weight, and I want to find some subset A of S from a permitted class of subsets P with minimum (or maximum) total weight. Examples might be:
- I am allowed to grab exactly k objects and my goal is to maximize total weight. The permitted subsets P in this case are exactly the subsets of S of size k.
Each object i costs ci dollars, and I want to grab as much weight as I can without spending more than b dollars. Now the permitted subsets are those that cost b or less.
The objects are edges of a connected undirected graph, and I want to find a subset which includes exactly one path between any two nodes in the graph that has minimum total weight. Now the permitted subsets are spanning trees---connected acyclic subgraphs that include all nodes.
In each case, there is a natural greedy algorithm: start with the empty subset, and at each step choose the best unused element (heaviest or lightest depending on whether you are maximizing or minimizing) that still leaves you with a subset of some permitted set. Stop when you have a permitted set that cannot be improved further by adding more elements.
Does this algorithm work? It works when we want to find the k objects with maximum total weight, since it just grabs the heaviest k objects. It fails when we want to find heavy objects that fit within our budget constraint (see description of KnapsackProblem in DynamicProgramming). And it works for finding minimum-weight spanning trees, as we will show below.
1.1. Kruskal's algorithm for minimum-weight spanning trees
Here's an algorithm for minimum-weight spanning trees. Start with an empty set of edges. At each step, add a lightest unused edge that doesn't create a cycle. Stop when you have a maximal acyclic graph (which will be a spanning tree if the graph is connected).
Before we show that this works, let's look at how to check if adding an edge to our partial tree T would create a cycle or not. Suppose we track for each node u which connected component of T u is in. Then adding uv creates a cycle if and only if there is already another path between u and v, i.e. if u and v are in the same connected component. We can track membership in connected components using a UnionFind data structure where identifying the component of a node takes O(1) time and the total time for all unions of the n vertices is O(n log n). So cycle testing adds O(V log V) time to the algorithm.
The rest of the algorithm just consists of sorting the edges (in O(E log E) time) and running through them one at a time (O(E) time, not counting the costs of union operations). The total time is thus O(E log E). This is not as fast as Prim's Algorithm (O(E log V); see LevitinBook), but it may be quicker if the sorting step can be sped up somehow.
All of this is irrelevant if the algorithm doesn't work. So let's prove that it does. At each step, let T be the partial tree constructed so far. We claim that is always some minimum spanning tree T* that contains all the edges in T, which is enough to get correctness because when T is a completed tree it must equal this optimal T*. Proof of claim: It's true initially. Suppose that it is true for the tree Ti obtained after i steps, and let Ti+1 = Ti + e, where e = uv is a lightest edge that doesn't create a cycle when added to Ti. If Ti+1 is a subset of T*, we are happy. Otherwise, consider the u-v path in T*. This contains some edge e' that is not in Ti and that is at least as heavy as e (otherwise we would have picked e' instead of e). Now consider the set of edges T* - e' + e. It's no heavier than T*. It's a tree, because removing e' separates T* into two components and adding e just rejoins them. It's a spanning tree, because any path that used to go through e' in T* can be rerouted through e instead. And---this is the most important part---it contains Ti+1. It follows that Ti+1 is contained in the minimum spanning tree T* - e' + e, and the induction goes through.
Let's try to pluck out the key property of trees that we needed in Kruskal's algorithm. We needed two things:
There was a common, recognizeable property (in this case being acyclic) of all potential solutions T*. Alternatively, the potential solutions T* were just the maximal elements of a family of sets (in this case, acyclic subgraphs of the original graph) that was closed under taking subsets.
There was always some object e' that we could have added to Ti to still get a subset of T*, even if we picked a different object e.
These are implied by two of the defining properties of matroids. A matroid is a nonempty family of subsets (called independent sets) of a finite set S with the properties:
- Any subset A of an independent set B is also independent.
For any maximal independent sets B and B' and any x in B, there is an x' in B' such that B - x + x' is a maximal independent set. This is known as the exchange axiom. Another version is that if A is a subset of B and both are independent, there is some x in B - A such that A + x is independent.
Examples of matroids (defined by their independent sets):
Acyclic subgraphs of a given graph G (graphical matroids). This is what Kruskal's algorithm uses.
- All subsets of a set S with size less than or equal to k.
Independent sets of rows of a matrix (matric matroids). These are the source of the name.
- Many others too horrible to describe.
2. Other greedy algorithms
Have an alphabet of characters a, b, c, ... with known weights. Want to organize them as the leaves of a tree to minimize the sum over all leaves of the leaf's weight times its depth (i.e. ∑u in tree wt(u) d(u)), which has applications in data compression (in particular, the resulting Huffman code, where each character x is replaced by a sequence of one or more bits describing the path through the tree to get to it, gives an optimal code of this type, which also uses on average at most one more bit per character than the optimum compression algorithm for sequences of characters generated independently at random using the weights as probabilities).
Huffman's algorithm similar to the minimum spanning tree algorithm. Start with n separate trees containing one character each. At each step, create a new tree by adding a new root node with the two lightest existing trees as its subtrees. Return the single tree that is left at the end.
LevitinBook doesn't actually prove that this works, but the proof is not terribly hard. There are two main steps: the first is to show that in any tree in which the two lightest characters are not siblings, you can swap them with two siblings lower down in the tree somewhere and improve the objective function. The second is to show that once you join two nodes together into a tree, you can treat the resulting tree as a single node and still get an optimal tree. This is a little bit tricky because at first it doesn't seem like you can concentrate the entire weight of the subtree on its root without changing the objective function. But by rewriting ∑u in subtree wt(u) d(u) as ∑u in subtree wt(u) [d(root) + (d(root) - d(u))] = d(root) ∑ wt(u) + ∑ wt(u) (d(root) - d(u), the first sum is equal to the effect of putting a single node at the position of the root, and the second sum doesn't change no matter where you put the subtree. So the optimal solution is not affected by the second sum and we can just concentrate on finding the best tree with our remaining n-1 nodes, where one of the nodes is really a little 3-node tree.
2.2. Dijkstra's algorithm