Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Progressively bad hash functions

Recall that for any prime p≥m, the family of hash functions h_ab:M→N given by h_ab(x) = ((ax + b) mod p) mod n for a∈ℤ_p^*, b∈ℤ_p is 2-universal: ∀x≠y Pr[h_ab(x) = h_ab(y)] ≤ 1/n. Show that it is not k-universal for any constant k>2, by showing that for any fixed k and sufficiently large n there exists a set S_n such that Pr[h_ab(S_n) is constant] = Ω(1/n) (with a constant that may depend on k).

1.1. Solution

Fix k, and let S_n=S be the set 0..k-1. Let A be the set { cn | 0 < cn < p/2k } and let B be the set { b | 0 ≤ b < p/2 }. Then for any a∈A and x∈S, we have ax < p/2 and ax mod n = 0. Adding in b∈B gives ax+b < p and ax mod n = b mod n for all x∈S. Because ax+b < p, we have (ax+b) mod p = ax+b and so h_ab(x) = ((ax+b) mod p) mod n = (ax+b) mod n = b mod n for all x∈S.

What is the probability that this event occurs? We have ⌊p/(2kn)-1⌋ elements in A, giving Pr[a∈A] = ⌊p/(2kn)-1⌋/(p-1) = Θ(1/kn). Similarly, there are ⌊p/2⌋ choices for b, giving Pr[b∈B] = ⌊p/2⌋/p = Θ(1). This gives a probability of Θ(1/kn) that h_ab(S) is constant.

2. Low expected distortion

Given a graph G, let V(G) be the set of vertices in G and let d_G(u,v) be the length of the shortest path from u to v in G, where u,v∈V(G).

Given a graph G and a probability distribution P on pairs (f,H) where H is a graph and f:V(G)→V(H) such that d_H(f(u),f(v)) ≥ d_G(u,v) for all u,v∈V(G), let the expected distortion δ(P) = max_u,v∈V(G) E[d_H(f(u),f(v))/d_G(u,v)].

Let G = C_n be a cycle with n vertices. Show that there exists a probability distribution P on (f,H) such that H is always P_n-1, the path with n vertices, and δ(P) = O(1).

2.1. Solution

Obtain H by cutting G along a uniform random edge, and let f be the identity map. Then d_H(u,v) ≥ d_G(u,v) trivially (removing edges can only increase the length of the shortest path. To prove the distortion bound, suppose that d_G(u,v) = k ≤ n/2. Then with probability k/n, we cut an edge on the shortest uv path, giving d_H(u,v) = n-k; the rest of the time, d_H(u,v) = d_G(u,v) = k. This gives E[d_H(u,v)] = (k/n)(n-k) + ((n-k)/n)(k) = 2k(n-k)/n < 2k and thus δ(P) ≤ max(2k/k) = 2.

3. Defective skip lists

Suppose we are given a skip list with a head node and n nodes with parameter p (the probability that a node at level k also appears at level k+1), and that we choose one of the non-head nodes uniformly at random and remove it without updating any pointers, so that any search that attempts to access the removed node fails.

What is the expected number of nodes that are still reachable by a standard skip list search starting from the head node?

3.1. Solution

It is possible to solve this problem by brute force, but it is easier to do it by making very heavy use of linearity of expectation. Suppose that the search path for j includes m non-head nodes (this includes j). Then the probability that j is not reachable, conditioned on this fact, is exactly m/n. The expected total number of non-reachable nodes is then E[∑ m_j/n], where m_j is the number of nodes on the search path to j, or (1/n) ∑_j E[m_j], the expected length of a search path to a uniformly-chosen node. Computing this quantity exactly is exceptionally painful, but we can easily get that it is O(log n / (p log (1/p))) using the usual skip list analysis.