[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

(See http://www.cs.yale.edu/homes/aspnes/classes/469/notes.pdf for a more up-to-date version of these notes.)

Let us consider probabilistic recurrences of the form T(n) = 1 + T(n-Xn), where Xn is a random variable with 0 < Xn ≤ n and T(0) = 0. We assume we can compute a lower bound on E[Xn] for each n, and we want to translate this lower bound into an upper bound on E[T(n)].

1. Examples

2. The Karp-Upfal-Wigderson bound

Let a be a constant, let T(n) = 1 + T(n-X), where X is an integer-valued random variable satisfying 0 ≤ X ≤ n-a and let T(a) = 0. Let E[X] ≥ μ(n) for all n, where μ is a positive non-decreasing function of n. Then

E[T(n)] &\le \int_a^n \frac{1}{\mu(t)} \,dt.

Proof: This is essentially the same proof as in MotwaniRaghavan, but we add some extra detail to allow for the possibility that X=0. Let p = Pr[X=0], q = 1-p = Pr[X≠0]. Note we have q>0 because otherwise E[X] = 0 < μ(n). Then we have

&= 1 + \E[T(n-X)] \\
&= 1 + p \E[T(n-X)|X=0] + q \E[T(n-X)|X\ne 0] \\
&= 1 + p \E[T(n)] + q \E[T(n-X)|X\ne 0].

Now we have E[T(n)] on both sides, which we don't like very much. So we collect it on the left-hand side:

(1-p) \E[T(n)] &= 1 + q \E[T(n-X)|X \ne 0],

divide both sides by q = 1-p, and apply the induction hypothesis:

\newcommand{\ugly}[1]{\int_a^{#1} \frac{1}{\mu(t)}\,dt}
\E[T(n)] &= 1/q + \E[T(n-X)|X \ne 0] \\
&\le 1/q + \E\left[\ugly{n-X}|X \ne 0\right] \\
&= 1/q + \E\left[\ugly{n} - \int_{n-X}^{n} \frac{1}{\mu(t)}\,dt| X \ne 0\right] \\
&\le 1/q + \ugly{n} - \E\left[\frac{X}{\mu(n)}|X \ne 0\right] \\
&\le 1/q + \ugly{n} - \frac{\E[X|X \ne 0]}{\mu(n)}.

The second-to-last step uses the fact that μ(t)≤μ(n) for t≤n.

It may seem like we don't know what E[X|X≠0] is. But we know that X≥0, so we have E[X] = p E[X|X=0] + q E[X|X≠0] = q E[X|X≠0]. So we can solve for E[X|X≠0] = E[X]/q. So let's plug this in:

\newcommand{\ugly}[1]{\int_a^{#1} \frac{1}{\mu(t)}\,dt}
&\le 1/q + \ugly{n} - \frac{\E[X]/q}{\mu(n)} \\
&\le 1/q + \ugly{n} - 1/q \\
&= \ugly{n}.

This concludes the proof. Now we just need to find some applications.

2.1. Quickselect

In Quickselect, we pick a random pivot and split the original array of size n into three piles of size m (less than the pivot), 1 (the pivot itself), and n-m-1 (greater than the pivot). We then figure out which of the three piles contains the k-th smallest element (depend on whether k </=/> m+1) and recurse, stopping when we hit a pile with 1 element. It's easiest to analyze this by assuming that we recurse in the largest of the three piles, i.e., that our recurrence is T(n) = 1 + max(T(m), T(n-m-1)), where m is uniform in 0..n-1. The exact value of E[max(m, n-m-1)] is a little messy to compute (among other things, it depends on whether n is odd or even), but it's not hard to see that it's always less than (3/4)n. So letting μ(n) = n/4, we get

\E[T(n)] \le \int_1^{n} \frac{1}{t/4} \,dt = 4 \ln n.

2.2. Tossing coins

Here we have E[Xn] = (1-p)n. If we let μ(n) = (1-p)n and plug into the formula without thinking about it too much, we get

\E[T(n)] &\le \int_0^n \frac{1}{(1-p)t} \,dt = \frac{1}{1-p}(\ln n - \ln 0).

That ln 0 is trouble. We can fix it by making μ(n) = (1-p)⌈n⌉, to get

\E[T(n)] &\le \int_{0+}^n \frac{1}{(1-p)\lceil t \rceil} \,dt \\
&= \frac{1}{1-p} \sum_{k=1}^{n} \frac{1}{k} \\
&= \frac{H_n}{1-p}.

2.3. Coupon collector

Now that we know how to avoid dividing by zero, this is easy and fun. Let μ(x) = ⌈x⌉/n, then we have

\E[T(n)] & \le \int_{0+}^{n} \frac{n}{\lceil t \rceil}\,dt \\
&= \sum_{k=1}^{n} \frac{n}{k} \\
&= n H_n.

As it happens, this is the exact answer for this case. This will happen whenever X is always a 0-1 variable and we define μ(x) = E[X|n=⌈x⌉], which can be seen by spending far too much time thinking about the precise sources of error in the inequalities in the proof.

2.4. Chutes and ladders

Let μ(n) be the expected drop from position n. We have to be a little bit careful about small n, but we can compute that in general μ(n) = $\frac{1}{6} \sum_{i=0}^{\min(n,6)} i$. For fractional values x we will set μ(x) = μ(⌈x⌉) as before. Then we have

\E[T(n)] & \le \int_{0+}^{n} \frac{1}{\mu(t)}\,dt \\
&= \sum_{k=1}^{n} \frac{1}{\mu(k)} \\

We can summarize the values in the following table:




∑ 1/μ(k)
























10+(2/7)(n-5)=(2/7)n + 65/7

This is a slight overestimate; for example, we can calculate by hand that the expected waiting time for n=2 is 6 and for n=3 is 20/3 = 6 2/3.

We can also consider the generalized version of the game where we start at n and drop by 1..n each turn as long as the drop wouldn't take us below 0. Now the expected drop from position k is k(k+1)/2n and so we can apply the formula to get

\E[T(n)] &\le \sum_{k=1}^{n} \frac{2n}{k(k+1)}.

The sum of 1/(k(k+1)) where k=1..n happens to have a very nice value; it's exactly n/(n+1).1 So in fact we can rewrite the bound as 2n(n/(n+1)) = 2n2/(n+1)

3. High-probability bounds

So far we have only considered bounds on the expected value of T(n). Suppose we want to show that T(n) is in fact small with high probability, i.e. a statement of the form Pr[T(n) ≥ t] ≤ ε. There are two natural ways to do this: we can repeatedly apply Markov's inequality to the expectation bound, or we can attempt to analyze the recurrence in more detail. The first method tends to give weaker bounds but it's easier.

3.1. Converting expectation bounds to high-probability bounds

Given E[T(n)] ≤ m, we have Pr[T(n) ≥ αm] ≤ α-1. This does not give a very good bound on probability; if we want to show Pr[T(n) ≥ t] ≤ n-c for some constant c (a typical high-probability bound), we need t ≥ ncm. But we can get a better bound if m bounds the expected time starting from any reachable state, as is the case for the class of problems we have been considering.

The idea is that if T(n) exceeds αm, we restart the analysis and argue that Pr[T(n) ≥ 2αm | T(n) ≥ αm] ≤ α-1, from which it follows that Pr[T(n) ≥ 2αm] ≤ α-2. In general, for any non-negative integer k, we have Pr[T(n) ≥ kαm] ≤ α-k. Now we just need to figure out how to pick α to minimize this quanitity for fixed t.

Let t = kαm. Then k = t/αm and we seek to minimize α-t/αm. Taking the logarithm gives -(t/m)(ln α)/α. The t/m factor is irrelevant to the minimization problem, so we are left with minimizing -(ln α)/α. Taking the derivative gives -α-2 + α-2 ln α; this is zero when ln α = 1 or α = e. (This popular constant shows up often in problems like this.) So we get Pr[T(n) ≥ kem] ≤ e-k, or, letting k = ln (1/ε), Pr[T(n) ≥ em ln(1/ε)] ≤ ε.

So, for example, we can get an n-c bound on the probability of running too long by setting our time bound to em ln(nc) = cem ln n = O(m log n). We can't hope to do better than O(m), so this bound is tight up to a log factor.

3.2. Detailed analysis of the recurrence

As Lance Fortnow explains, getting rid of log factors is what theoretical computer science is all about. So we'd like to do better than an O(m log n) bound if we can. In some cases this is not too hard.

Suppose for each n, T(n) = 1 + T(X), where E[X] ≤ αn for a fixed constant α. Let X0 = n, and let X1, X2, etc. be the sequence of sizes of the remaining problem at time 1, 2, etc. Then we have E[X1] ≤ αn from our assumption. But we also have E[X2] = E[E[X2|X1]] ≤ E[αX1,] = αE[X1] ≤ α2n, and by induction we can show that E[Xk] ≤ αkn for all k. Since Xk is integer-valued, E[Xk] is an upper bound on Pr[Xk > 0]; we thus get Pr[T(n) ≥ k] = Pr[Xk > 0] ≤ αkn. We can solve for the value of k that makes this less than ε: k = -log(n/ε) / log α = log1/α n + log1/α (1/ε).

For comparison, the bound on the expectation of T(n) from the KUW formula is H(n)/(1-α). This is actually pretty close to log1/α n when α is close to 1, and is not too bad even for smaller α. But the difference is that the dependence on log(1/ε) is additive with the tighter analysis, so for fixed c we get Pr[T(n) ≥ t] ≤ n-c at t = O(log n + log nc) = O(log n) instead of O(log n log nc) = O(log2 n).


  1. Proof: Trivially true for n=0; for larger n, compute ∑n 1/(k(k+1)) = (n-1)/n + 1/(n(n+1)) = ((n-1)(n+1) - 1)/(n(n+1)) = n2/(n(n+1)) = n/(n+1). (1)

2014-06-17 11:58