Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

# 1. Count-min banking

Suppose we have a Cormode-Muthukrishnan count-min filter (see DataStreamComputation), but we allow negative increments while still using the min function to compute results.

1. Show that as long as the total value ai for every index i is non-negative, the same bounds hold as in the non-negative increments case.

2. Suppose that we are using the count-min filter to process bank transactions, and that we reject any increment (it,ct) that would cause some counter in the count-min filter to drop below zero. The idea is that at the end of the day, any depositor for which ai > 0 will ask for their money back, while any depositor for which ai < 0 will vanish, leaving the bank with the missing money. Let S be the set of indices i for which ai<0, and let T be the set of indices i for which ai>0. What bounds can you put on ∑i∈S ai as a function of the number of indices n and ∑i∈T ai?

## 1.1. Solution

1. The essential idea is that the analysis of the count-min filter in this case only uses ai and not the actual increments, so everything goes through.

2. It is easy to show that -∑i∈S ai ≤ ∑i∈T ai, since the (non-negative) sum of all values in the count-min filter is proportional to the sum of all increments. For a sufficiently long sequence of transactions, it should be possible for the adversary to get arbitrarily close to this bound: it can just send in small negative transactions until it gets lucky and drains out all the cash.

# 2. A corrupt random number generator

Suppose you are asked to provide a mechanism that generates a sequence of n random bits, with the guarantees that (a) each bit is equally likely to be 0 or 1, and (b) the bits are pairwise independent. For nefarious reasons of your own, you would like the probability of getting all ones to be as large as possible.

1. Use Chebyshev's inequality to get an upper bound on the probability that all bits are one.
2. Give a construction that approaches this upper bound to within a constant factor.

## 2.1. Solution

1. Call the bits X1...Xn and let S = ∑ Xi. Then ES = ∑ EXi = n/2 and Var[S] = ∑ Var[Xi] = n/4 (since the bits are pairwise independent), so from Chebyshev's inequality we get Pr[S ≥ n] ≤ Var[S]/(n/2)2 = (n/4)/(n2/4) = 1/n.

2. Recall the construction of pairwise-independent bits where we generate m independent bits Y1...Ym, and for each nonempty subset S of [m] we let XS = ⊕i∈S Yi. This gives 2m-1 pairwise independent bits, with the property that if all the Yi are 0, so are all the XS. By taking ZS = ¬XS we get that all 2m-1 ZS are 1 with probability exactly 2-m. Let m = ⌈lg (n+1)⌉. Then by taking the first n of the 2m-1 = 2⌈lg (n+1)⌉-1 ≥ n random variables Z, we obtain n pairwise independent random bits with a probability of 2-⌈lg (n+1)⌉ ≥ 2-(1 + lg (n+1)) = 1/(2(n+1)). This is (asymptotically) within a factor of 2 of the upper bound, and for n = 2m-1 exactly we get a probability of 2-m = 1/(n+1) of all ones.

# 3. Hash table expansion

It is necessary to keep the load factor in a hash table constant to obtain O(1) search costs. The usual method for doing this is to grow the hash table if it gets too full. We can abstract this as an on-line problem where we are given a request sequence x1, x2, ..., where xt∈ℤ+ is the number of elements in the hash table at time t, and we must respond with a sequence of hash table sizes y1, y2, ..., where yt≥xt for all t, and whenever yt≠yt-1 we incur a cost yt.

Give the best algorithm you can for this problem, measured in terms of the competitive ratio. Assume that the adversary is oblivious.

## 3.1. Solution

Without loss of generality we can assume that {xt} is non-decreasing, so that OPT(x1...xk) = xk. In fact, we can got even further and assume that xt = t, since it costs the adversary nothing to add elements slowly. The adversary strategy can then be summarized simply by giving k. On the other side, we can restrict our attention only to "lazy" algorithms for which yi is non-decreasing (as it costs us nothing to leave yi high) and that do not change yi unless necessary. A natural way to specify such an algorithm is to give some increasing sequence of sizes s1, s2,..., where yi is the smallest sj that is bigger than maxt≤i xt.

For the simple doubling strategy, the sequence is 1, 2, 4, 8, ... . This gives a competitive ratio that approaches 4 in the limit. One factor of 2 comes from the sequence limit in calculating the cost = sm + sm-1 + ... = sm (1 + 1/2 + 1/4 + ...). The other comes because the adversary can choose k = sj+1 and force us to pay roughly 2sj+1 ≅ 4⋅OPT. We can reduce the constant by making it harder for the adversary to figure out where to stop.

We do so by choosing a random value r uniformly in the range [1/2, 1], and let sj = 2jr. Now suppose the adversary stops at k. Let z be the smallest power of 2 greater than or equal to k (i.e., z = 2⌈lg k⌉). We now consider two cases depending on the value of r:

1. If rz ≥ k, then the algorithm's cost is rz. This event occurs with probability (z-k)/(z/2) = 2(z-k)/z and the expected cost for the last increase conditioned on it occurring is (k+z)/2.
2. If rz < k, then the algorithm's cost is 2rz. This event occurs with probability 1-2(z-k)/z, and the expected cost conditioned on it occurring is (z+2k)/2.

So the algorithm's expected cost for its last increase is (z-k)(z+k)/z + (1-2(z-k)/z)(z+2k)/2 = (z2 + 2k2)/2z and its expected cost for all increases is bounded by twice this, (z2+2k2)/z. Dividing by k gives a competitive ratio of (z2+2k2)/zk = z/k + 2k/z. At k=z this is 3. At k=z/2 it is also 3. Differentiating with respect to k shows a unique extremum in the range [z/2,z] of z/√2; at this point the competitive ratio is 2√2 < 3 (it's a minimum). So we get 3 as the bound on the competitive ratio using this randomized strategy.

2014-06-17 11:58