# 1. Progressively bad hash functions

Recall that for any prime p≥m, the family of hash functions h_{ab}:M→N given by h_{ab}(x) = ((ax + b) mod p) mod n for a∈ℤ_{p}^{*}, b∈ℤ_{p} is 2-universal: ∀x≠y Pr[h_{ab}(x) = h_{ab}(y)] ≤ 1/n. Show that it is *not* k-universal for any constant k>2, by showing that for any fixed k and sufficiently large n there exists a set S_{n} such that Pr[h_{ab}(S_{n}) is constant] = Ω(1/n) (with a constant that may depend on k).

## 1.1. Solution

Fix k, and let S_{n}=S be the set 0..k-1. Let A be the set { cn | 0 < cn < p/2k } and let B be the set { b | 0 ≤ b < p/2 }. Then for any a∈A and x∈S, we have ax < p/2 and ax mod n = 0. Adding in b∈B gives ax+b < p and ax mod n = b mod n for all x∈S. Because ax+b < p, we have (ax+b) mod p = ax+b and so h_{ab}(x) = ((ax+b) mod p) mod n = (ax+b) mod n = b mod n for all x∈S.

What is the probability that this event occurs? We have ⌊p/(2kn)-1⌋ elements in A, giving Pr[a∈A] = ⌊p/(2kn)-1⌋/(p-1) = Θ(1/kn). Similarly, there are ⌊p/2⌋ choices for b, giving Pr[b∈B] = ⌊p/2⌋/p = Θ(1). This gives a probability of Θ(1/kn) that h_{ab}(S) is constant.

# 2. Low expected distortion

Given a graph G, let V(G) be the set of vertices in G and let d_{G}(u,v) be the length of the shortest path from u to v in G, where u,v∈V(G).

Given a graph G and a probability distribution P on pairs (f,H) where H is a graph and f:V(G)→V(H) such that d_{H}(f(u),f(v)) ≥ d_{G}(u,v) for all u,v∈V(G), let the **expected distortion** δ(P) = max_{u,v∈V(G)} E[d_{H}(f(u),f(v))/d_{G}(u,v)].

Let G = C_{n} be a cycle with n vertices. Show that there exists a probability distribution P on (f,H) such that H is always P_{n-1}, the path with n vertices, and δ(P) = O(1).

## 2.1. Solution

Obtain H by cutting G along a uniform random edge, and let f be the identity map. Then d_{H}(u,v) ≥ d_{G}(u,v) trivially (removing edges can only increase the length of the shortest path. To prove the distortion bound, suppose that d_{G}(u,v) = k ≤ n/2. Then with probability k/n, we cut an edge on the shortest uv path, giving d_{H}(u,v) = n-k; the rest of the time, d_{H}(u,v) = d_{G}(u,v) = k. This gives E[d_{H}(u,v)] = (k/n)(n-k) + ((n-k)/n)(k) = 2k(n-k)/n < 2k and thus δ(P) ≤ max(2k/k) = 2.

# 3. Defective skip lists

Suppose we are given a skip list with a head node and n nodes with parameter p (the probability that a node at level k also appears at level k+1), and that we choose one of the non-head nodes uniformly at random and remove it without updating any pointers, so that any search that attempts to access the removed node fails.

What is the expected number of nodes that are still reachable by a standard skip list search starting from the head node?

## 3.1. Solution

It is possible to solve this problem by brute force, but it is easier to do it by making very heavy use of linearity of expectation. Suppose that the search path for j includes m non-head nodes (this includes j). Then the probability that j is not reachable, conditioned on this fact, is exactly m/n. The expected total number of non-reachable nodes is then E[∑ m_{j}/n], where m_{j} is the number of nodes on the search path to j, or (1/n) ∑_{j} E[m_{j}], the expected length of a search path to a uniformly-chosen node. Computing this quantity exactly is exceptionally painful, but we can easily get that it is O(log n / (p log (1/p))) using the usual skip list analysis.