A collection of random variables is **k-wise independent** if any subset of k of the variables is independent (see RandomVariables). For discrete random variables, this means that for any subset of k of them, X_{1}...X_{k}, and any values x_{1}...x_{k}, it holds that Pr[∧_{i} X_{i} = x_{i}] = ∏_{i} Pr[X_{i}=x_{i}].

K-wise independent variables are interesting for randomized algorithms because they are often easier to generate that fully independent random variables, and for many purposes k-wise independence for some small k is enough.

# 1. Generating k-wise independent random variables

## 1.1. Using a random Boolean matrix

Let M be a random n×m matrix over ℤ_{2}; then the random variables Mx for each x∈ℤ_{2}^{m}-{0} are uniformly distributed over ℤ_{2}^{n} and pairwise independent. Proof: For uniform distribution we are just taking the XOR of one or more random vectors, which gives a random vector. For pairwise independence, observe that in computing Mx and My there is at least one vector that is included in one but not the other; so we have Mx = My⊕r where r is a random vector.

This gives 2^{b-1} pairwise-independent n-bit values using exactly bn bits.

## 1.2. Using a random polynomial over a finite field

Let a_{k-1}...a_{0} be independent uniform values over some finite field (see FiniteFields). Then the values p(x) = ∑_{i} a_{i}x^{i} are k-wise independent. (This generalizes the 2-universal hash family given in HashTables.)

Proof: Given any points x_{1}...x_{k} and values y_{1}...y_{k}, there is a unique degree-(k-1) polynomial p for which p(x_{i}) = y_{i} for i=1..k. The constructive way to prove this is to use Lagrange interpolation (see Lagrange polynomial), but in GF(p^{n}) we can prove it just by counting. For fixed x_{1}...x_{k}, there are exactly p^{nk} choices for both y_{1}...y_{k} and the coefficients a_{k-1}...,a_{0},. Suppose two polynomials p and q agree on x_{1}...x_{k}. Then each x_{i} is a zero of p-q. Because p-q has k distinct zeros, it is either zero or has degree at least k.^{1} Since p-q can't have degree k, we have p-q=0 ⇒ p=q. So the map from choices of y_{i} to coefficient a_{i} is injective, and since the codomain and domain have the same size it must be bijective. It follows that Pr[∧p(x_{i})=y_{i}] = 1/p^(kn) for all k distinct x_{i}.

Here we get p^{n} k-wise independent (n lg p)-bit values using exactly kn lg p bits. To compare with the previous case, let p=2, then we get 2^{n} k-wise independent n-bit values using kn bits.

# 2. Almost k-wise independence

Often we are happy with values that are ε-almost k-wise independent, which means that |Pr[∧_{i} X_{i} = x_{i}] - ∏_{i} Pr[X_{i}=x_{i}]| ≤ ε for some small ε. This can be used to cut down on how many fully random bits we need. We won't talk about these constructions much, but a classic paper on the subject is here.

# 3. Pairwise independence and Chebyshev's inequality

One application of k-wise independence is for Chebyshev's inequality arguments. For example, if we are sampling a sequence of identically distributed values X_{i} (e.g., the outcome of an experiment when we run a randomized algorithm), Pr[|∑ X_{i} - nEX_{i}| > α] < Var[∑X_{i}]/α^{2} by Chebyshev's inequality. For arbitrary random variables, Var[∑X_{i}] = ∑ Var[X_{i}] + ∑_{i≠j} Cov(X_{i},X_{j}); but if the X_{i} are pairwise independent, Cov(X_{i},X_{j}) = 0 for all i≠j. So we can save some random bits when doing statistical sampling by using a pairwise independent family.

# 4. Connection to cryptographically secure pseudorandom generators

A cryptographically secure **pseudorandom generator** is a function that takes a small number of random bits and generates a large number of bits that are computationally indistinguishable from random, in the sense that any polynomial-time Turing machine, when fed the pseudorandom bits as input, will answer yes with almost exactly the same probability as if fed real random bits as input. This means that pseudorandom bits are just as good as real random bits for running polynomial-time randomized algorithms. In general, we don't expect to get cryptographically secure pseudorandom generators without using cryptography (and unless P≠NP).

The framework for k-wise independent pseudorandom generators and cryptographically secure pseudorandom generators is exactly the same: we extract many bits from few. Since random bits are k-wise independent, the output of a cryptographically secure pseudorandom generator will look k-wise independent to any polynomial-time computation, but won't necessarily be k-wise independent. In the other direction, the output of a k-wise independent generate will be k-wise independent, but won't necessarily look random (for example, given any k values for the random-polynomial generator we can immediately predict all the others). Which you want depends on what you are doing. For a problem where you only need k-wise independence, using a k-wise independent family gives provable bounds that don't depend on unproven complexity-theoretic assumptions.

In real life, most people who aren't doing cryptography seem to use something like Mersenne twister, which is neither cryptographically secure nor k-wise independent for large k. This is an example of how practical programming often operations in a more benevolent world that the dystopian tyranny of the universal quantifier.

CategoryRandomizedAlgorithmsNotes

Proof: Let z be a zero of a nonzero polynomial f(x); then f(x) = (x-z)q(x) + r by the DivisionAlgorithm, and at x = z we get r = 0, proving f(x) = (x-z)q(x) for some q with deg(f) = deg(q)+1. We also have that g is nonzero as otherwise f would be zero. By induction we get that f(x) = ∏

_{i}(x-z_{i}) h(x) where the z_{i}are the zeros of f and h(x) is some polynomial with no zeros, implying that deg(f) ≥ k when f has k zeros. (1)