Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. The probabilistic method

Suppose we want to show that some object exists, but we don't know how to construct it explicitly. One way to do this is to devise some random process for generating objects, and show that the probability that it generates the object we want is greater than zero. This implies that something we want exists, because otherwise it would be impossible to generate; and it works even if the nonzero probability is very, very small. The systematic development of the method is generally attributed to the notoriously productive mathematician Paul Erdős and his frequent collaborator Alfréd Rényi.

We give a couple of example of the probabilistic method in action below. In each case, the probability that we get a good outcome is actually pretty high, so we could in principle generate a good outcome by retrying our random process until it works. There are some more complicated examples of the method for which this doesn't work, either because the probability of success is vanishingly small, or because we can't efficiently test whether what we did succeeded (the last example below may fall into this category). This means that we often end up with objects whose existence we can demonstrate even though we can't actually point to any examples of them. For a Computer Science example, it is known that there exist sorting networks (a special class of circuits for sorting numbers in parallel) that sort in time proportional to log n, where n is the number of values being sorted. But the best explicit constructions of such networks take time proportional to log² n, and the question of how to find an explicit network that achieves log n time has been open for over 25 years now despite many efforts to solve it.

2. Example: Unique hats

A collection of n robots each wishes to own a unique hat. Unfortunately, the State Ministry for Hat Production only supplies one kind of hat. A robot will only be happy if (a) it has a hat, and (b) no robot it sees has a hat. Fortunately each robot can only see k other robots. How many robots can be happy?

We could try to be clever about answering this question, or we could apply the probabilistic method. Suppose we give each robot a hat with independent probability p. Then the probability that any particular robot r is happy is pq^k, where q = 1-p is the probability that a robot doesn't have a hat and q^k gives the probability that none of the robots that r sees has a hat. If we let I_r be the indicator for the event that r is happy, we have E[I_r] = pq^k and the expected number of happy robots is E[∑I_r] = ∑E[I_r] = npq^k. Since we can achieve this value on average, there must be some specific assignment that achieves it as well.

To choose a good p, we apply the usual calculus trick of looking for a maximum by looking for the place where the derivative is zero. We have d/dp npq^k = nq^k - nkpq^k-1 (since dq/dp = -1), so we have nq^k - nkpq^k-1 = 0. Factoring out n and q^k-1 gives q - pk = 0 or 1-p-pk = 0 or p = 1/(k+1). For this value of p, the expected number of happy robots is exactly $n\left(\frac{1}{k+1}\right)\left(1-\frac{1}{k+1}\right)^k$ . For large k the last factor approaches 1/e, giving us approximately $(1/e)\frac{n}{k+1}$ happy robots.

Up to constant factors this is about as good an outcome as we can hope for: we can set things up so that our robots are arranged in groups of k+1, where each robot sees all the other robots in its group, and here we can clearly only have 1 happy robot per group, for a maximum of n/(k+1) happy robots.

Note that we can improve the constant 1/e slightly by being smarter than just handing out hats at random. Starting with our original n robots, look for a robot that is observed by the smallest number of other robots. Since there are only nk pairs of robots (r1, r2) where r1 sees r2, one of the n robots is only seen by at most k other robots. Now give this robot a hat, and give no hat to any robot that sees it or that it sees. This produces 1 happy robot while knocking out at most 2k+1 robots total. Now remove these robots from consideration and repeat the analysis on the remaining robots, to get another happy robot while again knocking out at most 2k+1 robots. Iterating this procedure until no robots are left gives at least n/(2k+1) happy robots; this is roughly (1/2)n/(k+1) for large k, which is a bit better than (1/e)n/(k+1). (It may be possible to improve the constant further with more analysis.) This shows that the probabilistic method doesn't necessarily produce better results than we can get by being smart. But the hope is that with the probabilistic method we don't have to be smart, and for some problems it seems that solving them without using the probabilistic method requires being exceptionally smart.

3. Example: Ramsey numbers

Consider a collection of n schoolchildren, and imagine that each pair of schoolchildren either like each other or hate each other. We assume that these preferences are symmetric: if x likes y, then y likes x, and similarly if x hates y, y hates x. Let R(k,h) be the smallest value for n that ensures that among any group of n schoolchildren, there is either a subset of k children that all like each other or a subset of h children that all hate each other.

It is not hard to show that R(k,h) is finite for all k and h (see Ramsey's Theorem). The exact value of R(k,h) is known only for small values of k and h. But we can use the probabilistic method to show that for k=h, it is fairly large.

Theorem: If k ≥ 3, then R(k,k) > 2^k/2.
Proof: Suppose each pair of schoolchildren flip a fair coin to decide whether they like each other or not. Then the probability that any particular set of k schoolchildren all like each other is $2^{-{k \choose 2}}$ and the probability that they all hate each other is the same. Summing over both possibilities and all subsets gives a bound of ${n \choose k} 2^{1-{k \choose 2}}$ on the probability that there is at least one subset in which everybody likes or everybody hates everybody. For n = 2^k/2, we have

$\begin{eqnarray*} {n \choose k} 2^{1-{k \choose 2}} &\le & \frac{n^k}{k!} 2^{1-{k \choose 2}} \\ &=& \frac{2^{k^2/2 + 1 - k(k-1)/2}}{k!} \\ &=& \frac{2^{k^2/2 + 1 - k^2/2 + k/2}}{k!} \\ &=& \frac{2^{1 + k/2}}{k!} \\ &<& 1. \end{eqnarray*}$

Because the probability that there is an all-liking or all-hating subset is less than 1, there must be some chance that we get a collection that doesn't have one. So such a collection exists. It follows that R(k,k) > 2^k/2, because we have shown that not all collections at n = 2^k/2 have the Ramsey property.

The last step in the proof uses the fact that 2^1+k/2 < k! for k≥3, which can be tested explicitly for k=3 and proved by induction for larger k. The resulting bound is a little bit weaker than just saying that n must be large enough that ${n \choose k} 2^{1-{k \choose 2}} \ge 1$ , but it's easier to use.

4. Example: Cycles in tournaments

In the previous example we used the probabilistic method to show that there existed a structure that didn't contain some particular substructure. For this example we will do the reverse: show that there exists a structure that contains many instances of a particular substructure.

Imagine that n wrestlers wrestler a round-robin tournament, so that ever wrestler wrestles every other wrestler exactly once, and so that for each pair of wrestlers one wrestler beats the other. Consider a cycle of wrestlers x₁, x₂, ..., x_n such that each wrestler x_i beats x_i+1 and x_n beats x₁.

Theorem: There is a possible outcome to this tournament with at least n! 2^-n such cycles.
Proof: Flip a coin to decide the outcome of each pairing. For any particular sequence of wrestlers x₁..x_n, the probability that it is a cycle is then 2^-n (since there are n bouts in the cycle and each has independent probability 2^-1 of going the right way). Now let π range over all permutations of the wrestlers and define I_π as the indicator variable that permutation π gives a cycle. We've just shown that E[I_π] = 2^-n; so the expected total number of cycles is E[∑_π I_π] = ∑_π E[I_π] = n! 2^-n. But if the average value is at least n! 2^-n, there must be some specific outcome that gives a value at least this high.

CategoryMathNotes