# 1. Count-min banking

Suppose we have a Cormode-Muthukrishnan count-min filter (see DataStreamComputation), but we allow negative increments while still using the min function to compute results.

Show that as long as the total value a

_{i}for every index i is non-negative, the same bounds hold as in the non-negative increments case.Suppose that we are using the count-min filter to process bank transactions, and that we reject any increment (i

_{t},c_{t}) that would cause some counter in the count-min filter to drop below zero. The idea is that at the end of the day, any depositor for which a_{i}> 0 will ask for their money back, while any depositor for which a_{i}< 0 will vanish, leaving the bank with the missing money. Let S be the set of indices i for which a_{i}<0, and let T be the set of indices i for which a_{i}>0. What bounds can you put on ∑_{i∈S}a_{i}as a function of the number of indices n and ∑_{i∈T}a_{i}?

## 1.1. Solution

The essential idea is that the analysis of the count-min filter in this case only uses a

_{i}and not the actual increments, so everything goes through.It is easy to show that -∑

_{i∈S}a_{i}≤ ∑_{i∈T}a_{i}, since the (non-negative) sum of all values in the count-min filter is proportional to the sum of all increments. For a sufficiently long sequence of transactions, it should be possible for the adversary to get arbitrarily close to this bound: it can just send in small negative transactions until it gets lucky and drains out all the cash.

# 2. A corrupt random number generator

Suppose you are asked to provide a mechanism that generates a sequence of n random bits, with the guarantees that (a) each bit is equally likely to be 0 or 1, and (b) the bits are pairwise independent. For nefarious reasons of your own, you would like the probability of getting all ones to be as large as possible.

- Use Chebyshev's inequality to get an upper bound on the probability that all bits are one.
- Give a construction that approaches this upper bound to within a constant factor.

## 2.1. Solution

Call the bits X

_{1}...X_{n}and let S = ∑ X_{i}. Then ES = ∑ EX_{i}= n/2 and Var[S] = ∑ Var[X_{i}] = n/4 (since the bits are pairwise independent), so from Chebyshev's inequality we get Pr[S ≥ n] ≤ Var[S]/(n/2)^{2}= (n/4)/(n^{2}/4) = 1/n.Recall the construction of pairwise-independent bits where we generate m independent bits Y

_{1}...Y_{m}, and for each nonempty subset S of [m] we let X_{S}= ⊕_{i∈S}Y_{i}. This gives 2^{m}-1 pairwise independent bits, with the property that if all the Y_{i}are 0, so are all the X_{S}. By taking Z_{S}= ¬X_{S}we get that all 2^{m}-1 Z_{S}are 1 with probability exactly 2^{-m}. Let m = ⌈lg (n+1)⌉. Then by taking the first n of the 2^{m}-1 = 2^{⌈lg (n+1)⌉}-1 ≥ n random variables Z, we obtain n pairwise independent random bits with a probability of 2^{-⌈lg (n+1)⌉}≥ 2^{-(1 + lg (n+1))}= 1/(2(n+1)). This is (asymptotically) within a factor of 2 of the upper bound, and for n = 2^{m}-1 exactly we get a probability of 2^{-m}= 1/(n+1) of all ones.

# 3. Hash table expansion

It is necessary to keep the load factor in a hash table constant to obtain O(1) search costs. The usual method for doing this is to grow the hash table if it gets too full. We can abstract this as an on-line problem where we are given a request sequence x_{1}, x_{2}, ..., where x_{t}∈ℤ^{+} is the number of elements in the hash table at time t, and we must respond with a sequence of hash table sizes y_{1}, y_{2}, ..., where y_{t}≥x_{t} for all t, and whenever y_{t}≠y_{t-1} we incur a cost y_{t}.

Give the best algorithm you can for this problem, measured in terms of the competitive ratio. Assume that the adversary is oblivious.

## 3.1. Solution

Without loss of generality we can assume that {x_{t}} is non-decreasing, so that OPT(x_{1}...x_{k}) = x_{k}. In fact, we can got even further and assume that x_{t} = t, since it costs the adversary nothing to add elements slowly. The adversary strategy can then be summarized simply by giving k. On the other side, we can restrict our attention only to "lazy" algorithms for which y_{i} is non-decreasing (as it costs us nothing to leave y_{i} high) and that do not change y_{i} unless necessary. A natural way to specify such an algorithm is to give some increasing sequence of sizes s_{1}, s_{2},..., where y_{i} is the smallest s_{j} that is bigger than max_{t≤i} x_{t}.

For the simple doubling strategy, the sequence is 1, 2, 4, 8, ... . This gives a competitive ratio that approaches 4 in the limit. One factor of 2 comes from the sequence limit in calculating the cost = s_{m} + s_{m-1} + ... = s_{m} (1 + 1/2 + 1/4 + ...). The other comes because the adversary can choose k = s_{j}+1 and force us to pay roughly 2s_{j+1} ≅ 4⋅OPT. We can reduce the constant by making it harder for the adversary to figure out where to stop.

We do so by choosing a random value r uniformly in the range [1/2, 1], and let s_{j} = 2^{j}r. Now suppose the adversary stops at k. Let z be the smallest power of 2 greater than or equal to k (i.e., z = 2^{⌈lg k⌉}). We now consider two cases depending on the value of r:

- If rz ≥ k, then the algorithm's cost is rz. This event occurs with probability (z-k)/(z/2) = 2(z-k)/z and the expected cost for the last increase conditioned on it occurring is (k+z)/2.
If rz < k, then the algorithm's cost is 2rz. This event occurs with probability 1-2(z-k)/z, and the expected cost conditioned on it occurring is (z+2k)/2.

So the algorithm's expected cost for its last increase is (z-k)(z+k)/z + (1-2(z-k)/z)(z+2k)/2 = (z^{2} + 2k^{2})/2z and its expected cost for all increases is bounded by twice this, (z^{2}+2k^{2})/z. Dividing by k gives a competitive ratio of (z^{2}+2k^{2})/zk = z/k + 2k/z. At k=z this is 3. At k=z/2 it is also 3. Differentiating with respect to k shows a unique extremum in the range [z/2,z] of z/√2; at this point the competitive ratio is 2√2 < 3 (it's a minimum). So we get 3 as the bound on the competitive ratio using this randomized strategy.