Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

[/Solutions] are available.

1. Count-min banking

Suppose we have a Cormode-Muthukrishnan count-min filter (see DataStreamComputation), but we allow negative increments while still using the min function to compute results.

Show that as long as the total value a_i for every index i is non-negative, the same bounds hold as in the non-negative increments case.
Suppose that we are using the count-min filter to process bank transactions, and that we reject any increment (i_t,c_t) that would cause some counter in the count-min filter to drop below zero. The idea is that at the end of the day, any depositor for which a_i > 0 will ask for their money back, while any depositor for which a_i < 0 will vanish, leaving the bank with the missing money. Let S be the set of indices i for which a_i<0, and let T be the set of indices i for which a_i>0. What bounds can you put on ∑_i∈S a_i as a function of the number of indices n and ∑_i∈T a_i?

2. A corrupt random number generator

Suppose you are asked to provide a mechanism that generates a sequence of n random bits, with the guarantees that (a) each bit is equally likely to be 0 or 1, and (b) the bits are pairwise independent. For nefarious reasons of your own, you would like the probability of getting all ones to be as large as possible.

Use Chebyshev's inequality to get an upper bound on the probability that all bits are one.
Give a construction that approaches this upper bound to within a constant factor.

3. Hash table expansion

It is necessary to keep the load factor in a hash table constant to obtain O(1) search costs. The usual method for doing this is to grow the hash table if it gets too full. We can abstract this as an on-line problem where we are given a request sequence x₁, x₂, ..., where x_t∈ℤ⁺ is the number of elements in the hash table at time t, and we must respond with a sequence of hash table sizes y₁, y₂, ..., where y_t≥x_t for all t, and whenever y_t≠y_t-1 we incur a cost y_t.

Give the best algorithm you can for this problem, measured in terms of the competitive ratio. Assume that the adversary is oblivious.