1. Randomized merging
Suppose we start with n piles of 1 object each, and at each step we pick two remaining piles uniformly at random, and merge them together at a cost equal to the size of the first of the two piles chosen. (For example, we may need to change a pointer in every element of the first pile to point to the second pile.)
What is the exact expected cost of all merges if we stop when there is only one pile left? Hint: compute E[S] = ∑ E[Xm] where Xm is the cost of the merge starting with m piles.
Use moment generating functions to get the best reasonable-looking bound you can on the upper tail of the total cost S. Ideally, your bound should look like Pr[S ≥ (1+δ)(something)] ≤ exp(something).
Note that you can't necessarily use Chernoff bounds directly, since the Xm aren't likely to be independent. Instead, you may find it helpful to try to compute a worst-case upper bound on E[exp(αXm)|Xn...Xm+1]. A useful fact here is that if ∑ xi = c, and you want to maximize ∑ f(xi) subject to xi ≥ a for all i, then when f is convex the maximum value is obtained by making all but one of the xi equal to a, and when f is concave the maximum value is obtained by making all of the xi equal to c/a.
1.1. Solution
Let Xm be the cost of the merge when there are m piles. Then E[Xm] = E[N] where N is the size of the first randomly chosen pile. Calculate E[N] = (1/m) ∑ ni = n/m, giving E[Xm] = n/m exactly. Now sum from m=2..n to get n(Hn-1).
Let S = ∑ Xm. We want to compute an upper bound on E[eαS]. As observed above, we do not expect the Xm to be independent, as the creation of a big pile early may allow the creation of still larger piles later. But if we assume the worst possible arrangement of the piles at each stage, we should be able to get an upper bound on E[Xm|Xn...Xm-1].
To do so, we need to maximize E[eαN] given that ∑ ni = n. Using the fact that eαx is a convex function for α>0, the maximum is obtained when one of the ni (n1, say) is as larger as possible and the rest are as small as possible. This gives E[exp(αXm)] ≤ (1/m)eα(n-m+1) + (1-1/m)eα = eα(1+((1/m)eα(n-m)-1)) ≤ eα(exp((1/m)eα(n-m)-1)) < exp(α + (1/m)eαn). Now take the product for m=2..n to get E[eαS] < exp((n-1)α + (Hn-1)eαn) < exp(αn + eαn ln n).
This gives Pr[S ≥ x] ≤ exp(αn + eαn ln n - αx) = exp(α(n-x) + eαn ln n). The right-hand side is minimized by minimizing the exponent α(n-x) + eαn ln n; differentiating with respect to α gives (n-x) + eαn n ln n, and after setting this to zero we get α = (1/n) (ln((x-n)/(n ln n)). We can plug this value of α back into the original bound to get
This is still a pretty horrifying expression. We can simplify it somewhat by writing x as (1+δ) n ln n, which turns (x-n)/n into δ ln n and (x-n)/(n ln n) into δ. We then get Pr[S ≥ (1+δ) n ln n] ≤ exp(-δ ln n (δ - 1/ln n)) = exp(-δ2 ln n + δ).
2. A Collatz-like process
The Collatz conjecture says that if you start with any positive integer x0, and let xt+1 = (3xt+1)/2 if xt is odd and xt/2 if xt is even, then you will eventually get to 1. Since the Collatz conjecture seems to be hard to prove, let's simplify the game by sending x either to 3x/2 (without the extra 1) or to x/2, with the choice of which to do made using a fair coin-flip. Note that in this simplified game, xt does not have to be an integer.
What is the expected value of xt (in the simplified game) as a function of t and x0?
Show that there is a constant c such that ln xt - ct is a martingale.
Assuming that x0 = 1, use the Azuma-Hoeffding inequality get a bound on the probability that xt ≥ N for any fixed N > 0.
2.1. Solution
Compute E[xt+1|xt] = (1/2)((3/2)xt+(1/2)xt) = xt. So {xt} is a martingale and E[xt] = x0.
We'll prove this in general for any process of the form Zt+1 = ZtMt+1, where Mt+1 is a random multiplier. Observe that ln Zt+1 = ln Zt + ln Mt+1; so E[ln Zt+1|ℱt] = ln Zt + E[ln Mt+1|ℱt]. We've just shown that {ln Zt} is a submartingale. To make it a martingale, subtract off the predictable process ∑ E[ln Mt+1|ℱt] = t E[ln Mt]. This gives c = (ln (3/2) + ln (1/2))/2 = ln(3/4)/2 = ln(√3/2)) ≅ -0.1438; so ln xt - t ln (√3/2) is a martingale.
Again it's probably easiest to do this first in general and then solve for the specific values for our process. Let Zt, Mt be as above, and consider the martingale Xt = ln Zt - t E[ln Mt]. If Mt always lies in the range [a,b], this can go up by at most ln b - E[ln Mt], and never drops to less than ln a - E[ln Mt]. For the specific case of a = 1/2, b = 3/2, the first quantity is ln (3/2) - ln (√3/2) = ln √3, and the second is ln 1/2 - ln (√3/2) = ln 1/√3 = - ln √3. Both steps are bounded in absolute value by ln √3. So we have Pr[(ln xt - t ln (√3/2)) - ln x0 ≥ α] ≤ exp(-α2/2 t ln2 √3). Under our assumption that x0 = 1, we can simplify the LHS to Pr[ln xt - t ln (√3/2) ≥ α], or Pr[xt ≥ N] where ln N = α + t ln √3/2, which gives α = ln N - t ln (√3/2). Plugging this value into the RHS gives Pr[xt ≥ N] ≤ exp(-(ln N - t ln (√3/2))2/2 t ln2 √3).
This is not a very nice-looking bound, but it does have the very interesting property that for large t the right-hand side is exp(-Θ(t)) (since ln (√3/2) < 0). This means that even though the expected value of xt is still 1, the probability that it exceeds any fixed bound N > 0 (including values much less than 1) goes to zero exponentially quickly. Generalization to small-stakes gambling games is left as an exercise for the reader.
3. Random back-off
A collection of n processes wish to transmit information across a shared broadcast channel, where a message can be transmitted successfully in a given time step only if exactly one process attempts to broadcast. If more than one process attempts to transmit, we get a collision, which is detected by the process's hardware. Otherwise all processes either see the message (if one process transmits) or the absence of a message (if no process transmits).
Suppose that at time 0, all n processes are active (and thus attempt to transmit). After any step where there is a collision, each active process remains active with some probability p, where 0 < p < 1. After a successful transmission (one active process) or silence (no active processes), all processes become active again.
- Show that some process eventually successfully transmits a message, with probability 1, and that the expected time E[T] until this happens is finite. (You do not need to calculate E[T].)
- Show that there is a constant c (which may depend on n and p) such that Pr[T ≥ α E[T]] ≤ exp(-cα).
Let At be the event that a message is transmitted at time t. Show that for fixed n and p, limt→∞ Pr[At] converges to a positive constant.
3.1. Solution
The basic idea: The process described is an ergodic Markov chain (with state space 2[n]), so we can use general properties of Markov chains to prove results about it. Irreducibility is immediate from the fact that every state is reachable with nonzero probability from the all-in state, while the all-in state is reachable in at most two steps from every state via the all-empty state. Aperiodicity is shown by observing that every state with at least two active processes satisfies pii > 0.
We now proceed to the specific claims:
Immediate from the fact that all states in an irreducible finite chain are non-null recurrent. More specifically, let y be the state in which only process 1 is active, and let p = minx≠∅ pxy. Let T be the worst-case expected time to reach y starting from an arbitrary initial state. Let S be the set of processes. If the initial state is ∅, we have T ≤ 2 + (1-p)T. If the initial state is not ∅, we have T ≤ 1 + (1-p)T. In either case we get T ≤ 2/p.
- This is true for all Markov chains. Since Pr[T ≥ eE[T]] ≤ exp(-1), and if we fail we get to try again, we get Pr[T ≥ αE[T]] ≤ exp(-α/e).
Immediate consequence of the ergodic theorem: Let A be the set of states with one active process; then limt→∞ Pr[At] = limt→∞ Pr[Xt∈A] = π(A).