# 1. Peaks and valleys

Suppose we flip a fair coin n times. For each i, 1≤i≤n, let X_{i} be the indicator of the event that the i-th coin-flip is heads. For each i, 2≤i≤n-1, let Y_{i} be the indicator for the event that X_{i-1}=0, X_{i}=1, and X_{i+1}=0; if this even occurs we say that the sequence of coin-flips has a *peak* at i. Let be the random variable counting the number of peaks in the sequence.

- What is E[S]?
- What is Var[S]?

## 1.1. Solution

1. To compute E[S], we just need to compute E[Y_{i}] for each i and sum the expectations. Each variable Y_{i} is one provided we get a pattern THT in the three coins surrounding position i; since the coins are fair and independent, this event occurs with probability (1/2)^{3} = 1/8. Summing over all n-2 values of i gives E[S] = (n-2)/8.

2. To compute Var[S], we use the sum formula for variance, taking into account the fact that not all of the Y_{i} are pairwise independent. First let's compute Var[Y_{i}]. We have previously shown that E[Y_{i}] = 1/8; since Y_{i}^{2} = Y_{i} this also means E[Y_{i}^{2}] = 1/8. So Var[Y_{i}] = E[Y_{i}^{2}] - (E[Y_{i}])^{2} = 1/8 - (1/8)^{2} = 7/64.

Now we need to compute Cov(Y_{i},Y_{j}) for all i≠j. We'll consider several cases.

If j=i+1, then there is a sequence of 4 coin-flips starting at position i-1 such that Y_{i}=1 iff the first three coins have the pattern THT and Y_{j}=1 iff the last three coins have the pattern THT. But we can't realize both patterns simultaneously because they disagree on the middle two coins. So in this case we have E[Y_{i}Y_{i+1}] = 0, and Cov(Y_{i},Y_{i+1}) = 0 - E[Y_{i}]E[Y_{j}] = -1/64.

If j=i+2, then there is a sequence of 5 coin-flips starting at position i-1 such that Y_{i}=Y_{j}=1 iff the coins come up THTHT. This occurs with probability (1/2)^{5} = 1/32, giving Cov(Y_{i},Y_{i+1}) = 1/32 - 1/64 = 1/64.

If j>i+2, then there is no overlap between the Y_{i} coins and the Y_{j} coins. The random variables Y_{i} and Y_{j} are thus independent, and we have Cov(Y_{i},Y_{j}) = 0.

The cases where j<i are symmetric.

Applying the sum formula gives:

# 2. Stronger than Markov

Let X_{1}...X_{n} be the indicator variables for independent coin-flips with bias p; that is, each X_{i} is 1 with probability p and 0 with probability 1-p. Let .

Compute E[e

^{S}].Use the expected value above and Markov's inequality to compute an upper bound on Pr[S≥m] = Pr[e

^{S}≥e^{m}].- Can you find values of p, n, and m, for which this bound is smaller than the bound E[S]/m given by a direct application of Markov's inequality?

## 2.1. Solution

1. Since the X_{i} are all independent, so are the random variables . So we have

2. Markov's inequality gives Pr[e^{S}≥e^{m}] ≤ E[e^{S}]/e^{m} = (pe+1-p)^{n}/e^{m}.

3. The bound is generally better when m is significantly greater than E[S] = pn. For example, if p=1/n and m=n, the new bound is (e/n+1-1/n)^{n}/e^{n} = (1/n+1/e-1/(en))^{n}; for large n this approaches (1/e)^{n}, which is very small (though not as small as the exact probability (1/n)^{n}). A straight application of Markov's inequality gives only n(1/n)/n = 1/n.

A similar case is when p=1/2 and m=n. Here the bound is ((e+1)/2)^{n}/e^{n} ~= (0.6839...)^{n}. The straight Markov's bound in this case is only 1/2; it doesn't even depend on n.

The main difference between applying Markov's inequality directly to S and applying it to e^{S} is that in the latter case we need independence to compute e^{S}. The technique of bounding E[e^{αS}] for appropriate choices of α goes by the name of Chernoff bounds and is a standard method for proving bounds on the tails of the distribution of a random variable.

# 3. At the genome factory

A custom gene splicing shop charges $2 to splice a thymine (T) amino acid into a single-strand DNA fragment, and $1 each for adenine (A) and cytosine (C). Guanine (G) is free. So, for example, constructing the sequence GATTACA costs 0+1+2+2+1+1+1=8 dollars.

Write a generating function such that the z

^{k}coefficient gives the number of ways to buy a single amino acid for k dollars.Write a generating function such that the z

^{k}coefficient gives the number of ways to buy a single DNA strand consisting of n amino acids for k dollars.- Give a simple expression for the number of strands of length n that cost k dollars.

## 3.1. Solution

1+2z+z

^{2}.(1+2z+z

^{2})^{n}.The trick here is that (1+2z+z

^{2}) factors as (1+z)^{2}. So (1+2z+z^{2})^{n}= (1+z)^{2n}. From the binomial theorem, this is equal to . It follows that there are exactly strands of length n that cost k dollars.