# 1. Randomized merging

Suppose we start with n piles of 1 object each, and at each step we pick two remaining piles uniformly at random, and merge them together at a cost equal to the size of the first of the two piles chosen. (For example, we may need to change a pointer in every element of the first pile to point to the second pile.)

What is the exact expected cost of all merges if we stop when there is only one pile left? Hint: compute E[S] = ∑ E[X

_{m}] where X_{m}is the cost of the merge starting with m piles.Use moment generating functions to get the best reasonable-looking bound you can on the upper tail of the total cost S. Ideally, your bound should look like Pr[S ≥ (1+δ)(

*something*)] ≤ exp(*something*).

Note that you can't necessarily use Chernoff bounds directly, since the X_{m} aren't likely to be independent. Instead, you may find it helpful to try to compute a worst-case upper bound on E[exp(αX_{m})|X_{n}...X_{m+1}]. A useful fact here is that if ∑ x_{i} = c, and you want to maximize ∑ f(x_{i}) subject to x_{i} ≥ a for all i, then when f is convex the maximum value is obtained by making all but one of the x_{i} equal to a, and when f is concave the maximum value is obtained by making all of the x_{i} equal to c/a.

## 1.1. Solution

Let X

_{m}be the cost of the merge when there are m piles. Then E[X_{m}] = E[N] where N is the size of the first randomly chosen pile. Calculate E[N] = (1/m) ∑ n_{i}= n/m, giving E[X_{m}] = n/m exactly. Now sum from m=2..n to get n(H_{n}-1).Let S = ∑ X

_{m}. We want to compute an upper bound on E[e^{αS}]. As observed above, we do not expect the X_{m}to be independent, as the creation of a big pile early may allow the creation of still larger piles later. But if we assume the worst possible arrangement of the piles at each stage, we should be able to get an upper bound on E[X_{m}|X_{n}...X_{m-1}].To do so, we need to maximize E[e

^{αN}] given that ∑ n_{i}= n. Using the fact that e^{αx}is a convex function for α>0, the maximum is obtained when one of the n_{i}(n_{1}, say) is as larger as possible and the rest are as small as possible. This gives E[exp(αX_{m})] ≤ (1/m)e^{α(n-m+1)}+ (1-1/m)e^{α}= e^{α}(1+((1/m)e^{α(n-m)}-1)) ≤ e^{α}(exp((1/m)e^{α(n-m)}-1)) < exp(α + (1/m)e^{αn}). Now take the product for m=2..n to get E[e^{αS}] < exp((n-1)α + (H_{n}-1)e^{αn}) < exp(αn + e^{αn}ln n).This gives Pr[S ≥ x] ≤ exp(αn + e

^{αn}ln n - αx) = exp(α(n-x) + e^{αn}ln n). The right-hand side is minimized by minimizing the exponent α(n-x) + e^{αn}ln n; differentiating with respect to α gives (n-x) + e^{αn}n ln n, and after setting this to zero we get α = (1/n) (ln((x-n)/(n ln n)). We can plug this value of α back into the original bound to getThis is still a pretty horrifying expression. We can simplify it somewhat by writing x as (1+δ) n ln n, which turns (x-n)/n into δ ln n and (x-n)/(n ln n) into δ. We then get Pr[S ≥ (1+δ) n ln n] ≤ exp(-δ ln n (δ - 1/ln n)) = exp(-δ

^{2}ln n + δ).

# 2. A Collatz-like process

The Collatz conjecture says that if you start with any positive integer x_{0}, and let x_{t+1} = (3x_{t}+1)/2 if x_{t} is odd and x_{t}/2 if x_{t} is even, then you will eventually get to 1. Since the Collatz conjecture seems to be hard to prove, let's simplify the game by sending x either to 3x/2 (without the extra 1) or to x/2, with the choice of which to do made using a fair coin-flip. Note that in this simplified game, x_{t} does not have to be an integer.

What is the expected value of x

_{t}(in the simplified game) as a function of t and x_{0}?Show that there is a constant c such that ln x

_{t}- ct is a martingale.Assuming that x

_{0}= 1, use the Azuma-Hoeffding inequality get a bound on the probability that x_{t}≥ N for any fixed N > 0.

## 2.1. Solution

Compute E[x

_{t+1}|x_{t}] = (1/2)((3/2)x_{t}+(1/2)x_{t}) = x_{t}. So {x_{t}} is a martingale and E[x_{t}] = x_{0}.We'll prove this in general for any process of the form Z

_{t+1}= Z_{t}M_{t+1}, where M_{t+1}is a random multiplier. Observe that ln Z_{t+1}= ln Z_{t}+ ln M_{t+1}; so E[ln Z_{t+1}|ℱ_{t}] = ln Z_{t}+ E[ln M_{t+1}|ℱ_{t}]. We've just shown that {ln Z_{t}} is a submartingale. To make it a martingale, subtract off the predictable process ∑ E[ln M_{t+1}|ℱ_{t}] = t E[ln M_{t}]. This gives c = (ln (3/2) + ln (1/2))/2 = ln(3/4)/2 = ln(√3/2)) ≅ -0.1438; so ln x_{t}- t ln (√3/2) is a martingale.Again it's probably easiest to do this first in general and then solve for the specific values for our process. Let Z

_{t}, M_{t}be as above, and consider the martingale X_{t}= ln Z_{t}- t E[ln M_{t}]. If M_{t}always lies in the range [a,b], this can go up by at most ln b - E[ln M_{t}], and never drops to less than ln a - E[ln M_{t}]. For the specific case of a = 1/2, b = 3/2, the first quantity is ln (3/2) - ln (√3/2) = ln √3, and the second is ln 1/2 - ln (√3/2) = ln 1/√3 = - ln √3. Both steps are bounded in absolute value by ln √3. So we have Pr[(ln x_{t}- t ln (√3/2)) - ln x_{0}≥ α] ≤ exp(-α^{2}/2 t ln^{2}√3). Under our assumption that x_{0}= 1, we can simplify the LHS to Pr[ln x_{t}- t ln (√3/2) ≥ α], or Pr[x_{t}≥ N] where ln N = α + t ln √3/2, which gives α = ln N - t ln (√3/2). Plugging this value into the RHS gives Pr[x_{t}≥ N] ≤ exp(-(ln N - t ln (√3/2))^{2}/2 t ln^{2}√3).

This is not a very nice-looking bound, but it does have the very interesting property that for large t the right-hand side is exp(-Θ(t)) (since ln (√3/2) < 0). This means that even though the expected value of x_{t} is still 1, the probability that it exceeds any fixed bound N > 0 (including values much less than 1) goes to zero exponentially quickly. Generalization to small-stakes gambling games is left as an exercise for the reader.

# 3. Random back-off

A collection of n processes wish to transmit information across a shared broadcast channel, where a message can be transmitted successfully in a given time step only if exactly one process attempts to broadcast. If more than one process attempts to transmit, we get a collision, which is detected by the process's hardware. Otherwise all processes either see the message (if one process transmits) or the absence of a message (if no process transmits).

Suppose that at time 0, all n processes are active (and thus attempt to transmit). After any step where there is a collision, each active process remains active with some probability p, where 0 < p < 1. After a successful transmission (one active process) or silence (no active processes), all processes become active again.

- Show that some process eventually successfully transmits a message, with probability 1, and that the expected time E[T] until this happens is finite. (You do not need to calculate E[T].)
- Show that there is a constant c (which may depend on n and p) such that Pr[T ≥ α E[T]] ≤ exp(-cα).
Let A

_{t}be the event that a message is transmitted at time t. Show that for fixed n and p, lim_{t→∞}Pr[A_{t}] converges to a positive constant.

## 3.1. Solution

The basic idea: The process described is an ergodic Markov chain (with state space 2^{[n]}), so we can use general properties of Markov chains to prove results about it. Irreducibility is immediate from the fact that every state is reachable with nonzero probability from the all-in state, while the all-in state is reachable in at most two steps from every state via the all-empty state. Aperiodicity is shown by observing that every state with at least two active processes satisfies p_{ii} > 0.

We now proceed to the specific claims:

Immediate from the fact that all states in an irreducible finite chain are non-null recurrent. More specifically, let y be the state in which only process 1 is active, and let p = min

_{x≠∅}p_{xy}. Let T be the worst-case expected time to reach y starting from an arbitrary initial state. Let S be the set of processes. If the initial state is ∅, we have T ≤ 2 + (1-p)T. If the initial state is not ∅, we have T ≤ 1 + (1-p)T. In either case we get T ≤ 2/p.- This is true for all Markov chains. Since Pr[T ≥ eE[T]] ≤ exp(-1), and if we fail we get to try again, we get Pr[T ≥ αE[T]] ≤ exp(-α/e).
Immediate consequence of the ergodic theorem: Let A be the set of states with one active process; then lim

_{t→∞}Pr[A_{t}] = lim_{t→∞}Pr[X_{t}∈A] = π(A).