LevitinBook uses the term *brute force* to describe any algorithm design where computational power is used as a substitute for programmer cleverness. Typically this involves unpacking the specification into an algorithm, either by implementing the specification directly (as in problems like matrix multiplication), or by searching through the space of all possible outputs to find one that meets the specification. The intent of brute force algorithm design is not necessarily to get the best possible algorithm, but to get some algorithm for solving a problem that can be implemented quickly.

Many such algorithms work by *exhaustive search*, also known as *generate and test*. The idea is that if we can enumerate some set that contains the correct output, we can use the specification to decide when we've found it. The basic pattern looks like this:

for x in possible outputs do: if specification(input, x) = OK then: return x

It's trivial to see that anything that this procedure returns will satisfy the specification. What is sometimes harder is to see how to generate the set of possible outputs, or what the running time of the algorithm will be if the size of this set is difficult to describe. We will see some examples of this below.

Exhaustive-search algorithms tend to be slow, particularly if the set of possible outputs is large. Sometimes we can use a naive exhaustive-search solution as a starting point for a better algorithm by thinking about how to constrain the set of solutions to include fewer duds.

# 1. Primality testing

Suppose we want to know if an n-bit number x is prime. A number is defined to be prime if it is greater than one and has no non-trivial factors, which is a succinct way of saying that there is no number f with 1 < f < x such that f evenly divides x. So a first stab at a brute-force solution would simply be to implement this definition directly. This gives us:

## 1.1. Primality testing by trial division

IsPrime(x) for f = 2 to x-1: if x mod f = 0: return 'not prime' return 'prime'

What is the running time of this procedure? As a function of x, and under the assumption that we can do division in constant time, it's Theta(x) when x is prime, and Theta(smallest factor of x) when x is composite (i.e., not prime). So it runs fast for numbers with small factors (e.g. even numbers). But if we look at the running time as a function of the number of bits n in the input, its worst-case behavior is not so good; a typical n-bit number is of order 2^{n}, and there are enough primes that we get a Theta(2^{n}) worst-case cost. This is not very good; if we assume (generously, in 2004) that we can do 2^{30} trial divisions per second, and are willing to wait at most 2^{20} seconds (a little more than twelve days) to find out if our number is prime, we can do up to 2^{50} trial divisions and thus test numbers with up to 50 bits. These include some pretty big numbers, but it would be nice to extend our range.

It's not likely that we can improve this by improving the cost of the (constant-time) test for divisibility for each factor. Instead, we have to somehow reduce the number of factors we consider. One approach would be to test only prime factors, since if a number is divisible by anything it is divisible by some prime. This raises the issue of how we identify such prime factors without an expensive recursive call to the same procedure, but if we really want to do this, we can use a classic prime-listing algorithm due to Eratosthenes.

## 1.2. The Sieve of Eratosthenes

The *Sieve of Eratosthenes* can be thought of as a bottom-up exhaustive search algorithm for primes; here we start with a list of all numbers 1..x, and repeatedly cross out the ones that aren't primes. Pseudocode for the algorithm is given below:

IsPrime(x): A = array with indices 1..x, initialized to 'prime' A[1] = 'not prime' for i = 2 to x if A[i] = 'prime' { mark all multiples of i } j = 2*i while j <= x A[j] = 'not prime' j = j+i return A[x]

To compute the running time of this algorithm, observe that the inner j loop takes O(x/i) iterations, so the total time is bounded by Sigma_{i=2 to x} x/i = x Sigma,,i=2 to x 1/i = O(x log x). This is actually slightly worse than the O(x) running time of simple trial division, although by bringing in some number theory it is possible to show that the j loop is executed infrequently enough that the actual cost is Theta(x). While this cost is the same as for trial division, the selling point of the Sieve is that (a) it generates a list of all primes less than or equal to x, in addition to finding if x is prime or not, and (b) it doesn't use division, which may in practice be a more expensive operation than addition. But if we want to improve the O(2^{n}) upper bound---or if we can't muster up Omega(x) bits of space to hold the table---we will need some other idea.

## 1.3. Reducing the number of factors tried

Suppose that f is the smallest factor of x. How big can f be? We know that x = fg for some other factor g. By assumption g is at least as large as f, so x = f*g >= f*f = f^{2}. In particular, this means that if we have not found a factor by the time we get to sqrt(x), we aren't going to find any factors.

Here's a modified version of our original trial division algorithm that uses this idea:

IsPrime(x) f = 2 while f*f <= x: if x mod f = 0: return 'not prime' f = f+1 return 'prime'

Since this only tests factors up to sqrt(x), the running time is O(sqrt(x)) = O(2^{n/2}). So now we might be able to test numbers up to 100 bits if we are willing to wait long enough.

If divisions are much more expensive than other operations, we can improve the algorithm further by using the Sieve of Eratosthenes to generate only prime factors:

IsPrime(x): for all primes f in the range 2..sqrt(x): if x mod f = 0 return 'not prime' return 'prime'

Here the cost is still O(2^{n/2}) additions, but the number of divisions can be shown (using the fact that there are roughly x/ln x primes less than or equal to x) to be only O(2^n/2-n ln 2).

In passing, we can mention that the randomized Miller-Rabin test detects primality with high probability in O(n) multiplications, and the more recent Agrawal-Kayal-Saxena test detects primality deterministically (i.e., with no possibility of error) in O(n^{12}) bit operations. Understanding these algorithms requires knowing a fair bit of number theory, but the basic structure is still exhaustive search: both algorithms work by looking for "witnesses" to some property that is satisfied by non-primes but not by primes, and the difference from trial division is that these properties are chosen so that there are fewer potential witnesses than when a witness is one of sqrt(x) possible factors.

# 2. Monkeysort

The "MonkeySort" or "BogoSort" algorithm sorts an array by generating all possible permutations and looking for the one that is sorted. The only tricky part is figuring out how to generate all permutations. This can be done by a DecreaseAndConquer approach that reduces the problem of generating all permutations of n elements to generating all permutations of n-1 elements, by choosing the first element and then permuting the rest. Pseudocode follows:

AllPermutations(prefix, unused): if length(unused) = 0: yield prefix else for i = 1 to length(unused): AllPermutations(prefix + unused[i], unused - unused[i])

The number of recursive calls to this procedure is given by T(n) = 1 + n*T(n-1). The solution to this is Sigma_{i=1 to n} i!, which is Theta(n!). The cost of each call depends mostly on how long it takes to delete unused[i] from unused, but this is probably dominated in BogoSort by the Theta(n) cost of testing if a permutation is sorted. As we saw before, the final cost is O(n*n!) (maybe O(n!) if we are really clever).

Let's see if we can improve this bound by reducing the search space. AllPermutations takes a long time because it tries all n elements in the first position. But we know that in a sorted list there is at most one element that can be in the first position. So if we find that element, we can reduce our cost by a factor of n at the top level, and still further factors in the recursive calls. The result is

BestPermutation(prefix, unused): if length(unused) = 0: yield prefix else let i be the index of a minimum element of unused AllPermutations(prefix + unused[i], unused - unused[i])

Now the recurrence for the running time is T(n) = T(n-1) + Theta(n), with T(0) being the cost of whatever test we do at the end. This has solution Sigma_{i=1 to n} Theta(i) + T(0) = Theta(n^{2}) + T(0). For our optimized version of BogoSort, T(0) is Theta(n), so the total cost is just Theta(n^{2}).

With some additional tinkering (e.g., replacing the tail recursion with a loop), we end up with the standard SelectionSort algorithm.

# 3. Vertex cover

In the *vertex cover* problem, we are given a graph and must mark a set of k or fewer vertices so that every edge in the graph gets a mark on at least one of its endpoints. This is believed to be a very difficult problem to solve efficiently, which is good news for exhaustive search---we don't have to be embarrassed if our solution is not very efficient.

So how can we find the winning set of k vertices? Let's suppose we already have a vertex cover tester that tells us if a particular set of k vertices is good; this can be done in O(n^{2}) time if n is the number of vertices. So we just need to generate all sets of k vertices. We can do this by a DecreaseAndConquer approach similar to that used to get all permutations.

AllSubsets(prefix, unused, k) if length(prefix) = k yield prefix else for i = 1 to length(unused) - k + length(prefix) + 1 AllSubsets(prefix + unused[i], unused[i+1..], k)

This will run in O(n^{k}/k!) time, although the analysis is not trivial. But if we accept this bound we get a running time of O(n^{k+2}/k!) for finding a vertex cover of size k in a graph of n vertices.

Can we do better? It turns out that for fixed k, a vertex cover can be found in time linear in the number of edges in the graph (i.e. in O(n^{2}) time where n is the number of vertices). The trick is that for each edge in the graph, one of its endpoints must be marked. So if we concentrate on choosing endpoints for unmarked edges instead of choosing unmarked vertices, we can reduce the number of choices at each step from n to 2.

VertexCover(edges, k): if edges = {} # we win return true else if k = 0: # we lose, keep trying return false else: for each endpoint x of edges[1]: let edges' = { e in edges : x is not an endpoint of e } if VertexCover(edges', k-1) = true # we found one, stop looking return true # else we didn't find one, keep trying return false

This version omits the additional code needed to find a vertex cover, but that doesn't change things much. The number of recursive calls is bounded by T(k) = 1 + 2T(k-1) = O(2^{k}); the cost of each call is dominated by the cost of computing edges', which can be done (by BruteForce!) in O(n^{2}) time; and the total cost is thus O(2^{k}n^{2}), which can be rewritten as O(n^{2}) if k is a constant.

# 4. Levin's universal algorithm

*Levin search*, named for its inventor Leonid Levin, solves any problem for which an O(f(n)) time specification is known and an O(g(n)) algorithm exists in time O(f(n)+g(n)).

Formally, what we need is:

A

*known*algorithm S which returns Yes or No when presented with an input pair (input, output), computing its result in time O(f(n)), where n is the length of the input. (It is actually enough for S to return Yes when it should and run forever instead of returning No, as long as it never incorrectly returns Yes.)The existence of an algorithm P

_{i}that computes a correct output for each input in time O(g(n)), where correctness means that S(input, P_{i}(input)) is always Yes. We do*not*need to know what P_{i}is, what g(n) is, or even that P_{i}or g exist in order to apply Levin search.

The search procedure is as follows: given some enumeration of all algorithms P_{1}, P_{2}, P_{3}, ..., we will run all of them interleaved according to a schedule that looks like this:^{1}

- 1213121412131215121312141213121612131214121312151213121412131217...

i.e. P_{1} runs every other step, P_{2} runs every other step of the remaining steps, P_{3} runs every other step of what's left, etc. This interleaving assigns one step out of every 2^{i} to P_{i} for any i.

When a particular P_{i} produces an output, we then run a copy S_{i} of S on the output, again with the same interleaving schedule. If S_{i} returns Yes, we halt and return the output computed by P_{i}.

Suppose there is some P_{i} that produces the correct solution for all inputs in time O(f(n)). Ignoring all the other interleaved executions, it takes O(f(n)+g(n)) steps of P_{i} and S_{i} to find the correct output and verify it. These steps occur every 2^{i} steps of the full algorithm, for a total of O(2^{i}(f+g)) steps to find the solution. But i does not depend on n: it is a constant. So we can absorb it into the big O and simply get O(f+g) steps.

What this means: for any problem for which it is no harder to check the solution than generate it, we know (in the weak sense that we can run Levin search) an optimal algorithm for the problem, even if we don't know (in the strong sense that we have a particular algorithm that we understand) any algorithm at all for the problem. The only difficulty is that the constant factor is likely to be quite large, so large that perhaps we should use an extra-big O to warn the reader just how large it is.

This particular interleaving was proposed by Li and Vitanyi; Levin's original interleaving was more complicated. (1)