[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

PDF version

1. What counting is

Recall that in SetTheory we formally defined each natural number as the set of all smaller natural numbers, so that n = { 0, 1, 2, ..., n-1 }. Call a set finite if it can be put in one-to-one correspondence with some natural number n. For example, the set S = { Larry, Moe, Curly } is finite because we can map Larry to 0, Moe to 1, and Curly to 2, and get a bijection between S and the set 3 = { 0, 1, 2 }. The size or cardinality of a finite set S, written |S| or #S, is just the natural number it can be put in one-to-one correspondence with; that this is well-defined (gives a unique size for each finite set) follows from the Pigeonhole Principle (see below). In general, a cardinal number is a representative of some class of sets that can all put connected to each other by bijections; these include infinite cardinal numbers, discussed below.

Usually we will not provide an explicit bijection to compute the size of a set, but instead will rely on standard counting principles (see below). A proof that a set has a certain size (or that two sets have the same size) that does provide an explicit bijection is called a combinatorial proof. For sets of complicated objects, such proofs often provide additional insight into the structure of the objects.

1.1. Countable and uncountable sets

A set S is infinite if there is a bijection between S and some proper subset of S (proper subset means a subset that isn't equal to S), or equivalently if it can't be put in one-to-one correspondence with any natural number. One of the smallest infinite sets is just the set of all naturals ℕ. Any infinite set also has a cardinality; as for finite sets, the cardinality of an infinite set is defined based on what other sets it can be put into one-to-one correspondence with, and different infinite sets may have different cardinalities.

The smallest cardinality is ℵ0 (pronounced aleph-nought), the cardinality of ℕ. Any set S for which there exists a bijection f:S↔ℕ has cardinality ℵ0. Examples include the set of all pairs of natural numbers (which can be mapped to single natural numbers using various encodings), the set of integers ℤ, the set of rationals ℚ, the set of all finite sequences of natural numbers (using more sophisticated encodings), the set of all computer programs (represent them as sequences of natural numbers, e.g., using ASCII encoding), the set of all possible finite-length texts in any language with a countable output, or the set of all mathematical objects that can be explicitly defined (each such definition is a finite text).

Sets with cardinality ℵ0 or less are called countable; sets with cardinality exactly ℵ0 are countably infinite.

That there are larger cardinalities is a consequence of a famous proof due to Georg Cantor, the diagonalization argument:

Let S be any set. Then there is no surjection f:S→℘S.
Let f:S→℘S. We will show that f is not surjective, by constructing a subset A of S such that A≠f(x) for any x in S. Let A = { x | x∉f(x) }. Now choose some x and consider f(x). We have x∈A if and only if x∉f(x); so x is in exactly one of A and f(x), and in particular A≠f(x). It follows that A≠f(x) for all x, and thus that A isn't in the range of f.

Since any bijection is also a surjection, this means that |S| ≠ |℘S|. For finite sets, this is pretty easy to show directly: it just says that n<2n for all natural numbers n. For infinite sets, it means, for example, that |ℕ| ≠ |℘ℕ|, or equivalently that |℘ℕ| > ℵ0. This means that ℘ℕ is uncountable.

The next largest cardinal after ℵ0 is called ℵ1 (aleph-one). The standard axioms of SetTheory provably can't tell us whether |℘ℕ| = ℵ1 or not, but if we assume the Continuum hypothesis we can assert that |℘ℕ| = ℵ1. In this case ℵ1 not only gives the cardinality of the power set of the naturals, but also the cardinality of the set of real numbers ℝ, the set of complex numbers ℂ, the set of finite sequences of reals, etc. It does not give the cardinality of ℘ℝ or the set of functions f:ℝ→ℝ; these live (assuming a strengthened version of the Continuum hypothesis) up at ℵ2.

Because Cantor's theorem applies even to these bigger sets, there is an infinite sequence of increasingly large cardinalities ℵ0, ℵ1, ℵ2, ℵ3, ... . Pretty much anything past about ℵ1 falls into the category of incomprehensibly large (not an actual mathematical term).

2. Basic counting techniques

Our goal here is to compute the size of some set of objects, e.g. the number of subsets of a set of size n, the number of ways to put k cats into n boxes so that no box gets more than one cat, etc.

In rare cases we can use the definition of the size of a set directly, by constructing a bijection between the set we care about and some natural number. For example, the set Sn = { x ∈ ℕ | x < n2 /\ ∃y: x = y2 } has exactly n members, because we can generate it by applying the one-to-one correspondence f(y) = y2 to the set { 0, 1, 2, 3, ..., n-1 } = n. But most of the time constructing an explicit one-to-one correspondence is too time-consuming or too hard, so having a few lemmas around that tell us what the size of a set will be can be handy.

2.1. Reducing to a previously-solved case

If we can produce a bijection between a set A whose size we don't know and a set B whose size we do, then we get |A|=|B|. Pretty much all of our proofs of cardinality will end up looking like this.

2.2. Showing |A| ≤ |B| and |B| ≤ |A|

We write |A| ≤ |B| if there is an injection f:A→B, and similarly |B| ≤ |A| if there is an injection g:B→A. If both conditions hold, then there is a bijection between A and B, showing |A| = |B|. This fact is trivial for finite sets (it is a consequence of the Pigeonhole Principle), but for infinite sets—even though it is still true—the actual construction of the bijection is a little trickier. See Cantor-Bernstein-Schroeder theorem if you really want the gory details.

Similarly, if we write |A| ≥ |B| to indicate that there is a surjection from A to B, then |A| ≥ |B| and |B| ≥ |A| implies |A| = |B|. The easiest way to show this is to observe that if there is a surjection f:A→B, then we can get an injection f':B→A by letting f'(y) be any element of { x | f(x) = y } (this requires the Axiom of Choice, but pretty much everybody assumes the Axiom of Choice).


2.3. Sum rule

The sum rule says

Proof: Let f:A→|A| and g:B→|B| be bijections. Define h:A∪B→(|A|+|B|) by the rule h(x) = f(x) for x∈A, h(x) = |A|+g(x) for x∈B. We need to show that h is a bijection; we will do so by first showing that it is injective, then that it is surjective. Let x and y be distinct elements of A∪B. If x and y are both in A, then h(x) = f(x) ≠ f(y) = h(y); similarly if x and y are both in B, h(x) = |A|+g(x) ≠ |A| + g(y) = h(y). If x is in A and y in B, then h(x) = f(x) < |A|, and h(y) = g(y) + |A| ≥ |A|; it follows that h(x) ≠ h(y). The remaining case is symmetric. To show h is surjective, let m∈(|A|+|B|). If m < |A|, there exists some x∈A such that h(x) = f(x) = m. Otherwise, we have |A| ≤ m < |A|+|B|, so 0 ≤ m - |A| < |B|, and there exists some y∈B such that g(y) = m-|A|. But then h(y) = g(y)+|A| = m-|A|+|A| = m.

Generalizations: If A1, A2, A3 ... Ak are pairwise disjoint (i.e. Ai ∩ Aj = Ø for all i ≠j), then

\[\left|\bigcup_{i=1}^{k} A_i\right| = \sum_{i=1}^{k} |A_i|.\]

The proof is by induction on k.

2.3.1. Examples

2.3.2. For infinite sets

The sum rule works for infinite sets, too; technically, the sum rule is used to define |A|+|B| as |A∪B| when A and B are disjoint. This makes cardinal arithmetic a bit wonky: if at least one of A and B is infinite, then |A| + |B| = max(|A|,|B|), since we can space out the elements of the larger of A and B and shoehorn the other into the gaps.

2.3.3. The Pigeonhole Principle

A consequence of the sum rule is that if A and B are both finite and |A| > |B|, you can't have an injection from A to B. The proof is by contraposition. Suppose f:A→B is an injection. Write A as the union of f-1(x) for each x∈B, where f-1(x) is the set of y in A that map to x. Because each f-1(x) is disjoint, the sum rule applies; but because f is an injection there is at most one element in each f-1(x). It follows that $|A| = \sum_{x\in B} |f^{-1}(x)| \le \sum_{x \in B} 1 = |B|$. (Question: Why doesn't this work for infinite sets?)

The Pigeonhole Principle generalizes in an obvious way to functions with larger domains; if f:A→B, then there is some x in B such that |f-1(x)| ≥ |A|/|B|.

2.4. Inclusion-exclusion (with two sets)

What if A and B are not disjoint, i.e., if A∩B is not Ø? In this case adding |A| to |B| will count any element that appears in both sets twice. We can get the size of |A∪B| by subtracting off the overcount, obtaining this formula, which works for all A and B:

To prove that the formula works, we use the sum rule. Observe that A = (A∩B) ∪ (B-A) and that the union in this case is disjoint (recall that A-B is the set of all elements that are in A but not in B). A similar decomposition works for B, so we have

Here we are again using the sum rule: A∪B is the disjoint union of A-B, B-A, and A∩B. Subtracting the |A∩B| term from both sides gives the formula we originally wanted.

This is a special case of the inclusion-exclusion formula, which can be used to compute the size of the union of many sets using the size of pairwise, triple-wise, etc. intersections of the sets.

2.4.1. For infinite sets

Subtraction doesn't work very well for infinite quantities (while ℵ0+ℵ0=ℵ0, that doesn't mean ℵ0=0). So the closest we can get to the inclusion-exclusion formula is that |A| + |B| = |A∪B| + |A∩B|. If at least one of A or B is infinite, then |A∪B| is also infinite, and since |A∩B| ≤ |A∪B| we have |A∪B| + |A∩B| = |A∪B| by the usual but bizarre rules of cardinal arithmetic. So for infinite sets we have the rather odd result that |A∪B| = |A| + |B| = max(|A|,|B|) whether the sets overlap or not.

2.4.2. Combinatorial proof

We can prove |A| + |B| = |A∪B| + |A∩B| combinatorially, by turning both sides of the equation into disjoint unions (so the sum rule works) and then providing an explicit bijection between the resulting sets. The trick is that we can always force a union to be disjoint by tagging the elements with extra information; so on the left-hand side we construct L = {1}×A ∪ {2}×B, and on the right-hand side we construct R = {1}×(A∪B) ∪ {2}×(A∩B). It is easy to see that both unions are disjoint, because we are always taking the union of a set of ordered pairs that start with 1 with a set of ordered pairs that start with 2, and no ordered pair can start with both tags; it follows that |L| = |A| + |B| and |R| = |A∪B| + |A∩B|. Now define the function f:L→R by the rule

Observe that f is surjective, because for any (1,x) in {1}×(A∪B), either x is in A and (1,x) = f((1,x)) where (1,x) ∈ L, or x is in B\A and (1,x) = f((2,x)) where (2,x) ∈ L. It is also true that f is injective; the only way for it not to be is if f((1,x)) = f((2,x)) = (1,x) for some x. Suppose this occurs. Then x ∈ A (because of the 1 tag) and x ∈ B\A (because (2,x) is only mapped to (1,x) if x∈B\A). But x can't be in both A and B\A, so we get a contradiction.

2.5. Product rule

The product rule says that for any sets A and B

Recall that A×B is the Cartesian product of A and B, the set of all ordered pairs whose first element comes from A and whose second comes from B.

Proof (for finite sets): Let f:A→|A| and g:B→|B| be bijections. Construct a new function h:|A×B|→(|A|·|B|) by the rule h(<a,b>) = a·|B| + b. Showing that h is a bijection is left as an exercise to the reader (hint: use the DivisionAlgorithm).

The general form is

\[\left|\prod_{i=1}^{k} A_i\right| = \prod_{i=1}^{k} |A_i|,\]

where the product on the left is a Cartesian product and the product on the right is an ordinary integer product.

2.5.1. Examples

As a generalization of the previous example, we can count the number of ways P(n,k) we can pick an ordered subset of k of n items without replacement, also known as picking a k-permutation. There are n ways to pick the first item, n-1 to pick the second, and so forth, giving a total of

\[P(n,k) = \prod_{i=n-k+1}^{n} i = \frac{n!}{(n-k)!}\]

such k-permutations by the product rule.

Among combinatorialists, the notation (n)k (pronounced "n lower-factorial k") is more common than P(n,k) for n·(n-1)·(n-2)·...·(n-k+1). As an extreme case we have (n)n = n·(n-1)·(n-2)·...·(n-n+1) = n·(n-1)·(n-2)·...·1 = n!.

2.5.2. For infinite sets

The product rule also works for infinite sets, becasue we again use it as a definition: for any A and B, |A|⋅|B| is defined to be |A×B|. One oddity for infinite sets is that this definition gives |A|⋅|B| = |A|+|B| = max(|A|,|B|), because if at least one of A and B is infinite, it is possible to construct a bijection between A×B and the larger of A and B. (Infinite sets are strange.)

2.6. Exponent rule

Given sets A and B, let AB be the set of functions f:B→A. Then |AB| = |A||B|.

If |B| is finite, this is just a |B|-fold application of the product rule: we can write any function f:B→A as a sequence of length |B| that gives the value in A for each input in B. Since each element of the sequence contributes |A| possible choices, we get |A|^|B| choices total.

For infinite sets, the exponent rule is a definition of |A||B|. The behavior of exponentiation for infinite cardinals is very strange indeed; see here for some properties of cardinal exponentiation if you are really interested.

To give a flavor of how exponentiation works for arbitrary sets, here's a combinatorial proof of the usual arithmetic fact that xaxb = xa+b, for any cardinal numbers x, a, and b. Let x = |X| and let a = |A| and b = |B| where A and B are disjoint (we can always use the tagging trick that we used for inclusion-exclusion to make A and B be disjoint). Then xaxb = |XA×XB| and xa+b = |XA∪B|. We will now construct an explicit bijection f:XA∪B→XA×XB. The input to f is a function g:A∪B→X; the output is a pair of functions (gA:A→X,gB:B→X). We define gA by gA(x) = g(x) for all x in A (this makes gA the restriction of g to A, usually written as g|A or g⇂A); similarly gB = g|B. This is easily seen to be a bijection; if g = h, then f(g) = (g|A,g|B) = f(h) = (h|A,h|B), and if g≠h there is some x for which g(x)≠h(x), implying g|A≠h|A (if x is in A) or g|B≠h|B (if x is in B).

2.7. Counting the same thing in two different ways

An old farm joke:

Sometimes we can compute the size of a set S by using it (as an unknown variable) to compute the size of another set T (as a function of |S|), and then using some other way to count T to find its size, finally solving for |S|. This is known as counting two ways and is surprisingly useful when it works. We will assume that all the sets we are dealing with are finite, so we can expect things like subtraction and division to work properly.

Example: Let S be an n-element set, and consider the set Sk = { A⊆S | |A| = k }. What is |Sk|? Answer: First we'll count the number m of sequences of k elements of S with no repetitions. We can get such a sequence in two ways:

  1. By picking a size-k subset A and then choosing one of k! ways to order the elements. This gives m = |Sk|·k!.

  2. By choosing the first element in one of n ways, the second in one of n-1, the third in one of n-2 ways, and so on until the k-th element, which can be chosen in one of n-k+1 ways. This gives m = (n)k = n·(n-1)·(n-2)·...(n-k+1), which can be written as n!/(n-k)!. (Here we are using the factors in (n-k)! to cancel out the factors in n! that we don't want.)

So we have m = |Sk|·k! = n!/(n-k)!, from which we get

\[|S_k| = \frac{n!}{k!\cdot(n-k)!}.\]

This quantity turns out to be so useful that it has a special notation:

\[{n \choose k} \stackrel{\mbox{\scriptsize def}}{=} \frac{n!}{k!\cdot(n-k)!}.\]

where the left-hand side is known as a binomial coefficient and is pronounced "n choose k." We discuss BinomialCoefficients at length on their own page. The secret of why it's called a binomial coefficient will be revealed when we talk about GeneratingFunctions.

Example: Here's a generalization of binomial coefficients: let the multinomial coefficient

\[{n \choose n_1 \; n_2 \; \ldots \; n_k}\]

be the number of different ways to distribute n items among k bins where the i-th bin gets exactly ni of the items and we don't care what order the items appear in each bin. (Obviously this only makes sense if n1+n2+...+nk=n.) Can we find a simple formula for the multinomial coefficient?

Here are two ways to count the number of permutations of the n-element set:

  1. Pick the first element, then the second, etc. to get n! permuations.
  2. Generate a permutation in three steps:
    1. Pick a partition of the n elements into groups of size n1, n2, ... nk.

    2. Order the elements of each group.
    3. Paste the groups together into a single ordered list.

There are

\[{n \choose n_1 \; n_2 \; \ldots \; n_k}\]

ways to pick the partition and

\[n_1! \cdot n_2! \cdots n_k!\]

ways to order the elements of all the groups, so we have

\[n! = {n \choose n_1 \; n_2 \; \ldots \; n_k} \cdot n_1! \cdot n_2! \cdots n_k!\]

which we can solve to get

\[{n \choose n_1 \; n_2 \; \ldots \; n_k} = \frac{n!}{n_1! \cdot n_2! \cdots n_k!}\]

This also gives another way to derive the formula for a binomial coefficient, since

\[{n \choose k} = {n \choose k \;\; (n-k)} = \frac{n!}{k!\cdot (n-k)!}\]

3. Applying the rules

If you're given some strange set to count, look at the structure of its description:

Sometimes reducing to a previous case requires creativity. For example, suppose you win n identical cars on a game show and want to divide them among your k greedy relatives. Assuming that you don't care about fairness, how many ways are there to do this?

Finding such correspondences is a central part of enumerative combinatorics, the branch of mathematics that deals with counting things.

4. An elaborate counting problem

Suppose you have the numbers { 1, 2, .., 2n }, and you want to count how many sequences of k of these numbers you can have that are (a) increasing (a[i] < a[i+1] for all i), (b) decreasing (a[i] ≥ a[i+1] for all i), or (c) made up only of even numbers.

This is the union of three sets A, B, and C, corresponding to the three cases. The first step is to count each set individually; then we can start thinking about applying inclusion-exclusion to get the size of the union.

For A, any increasing sequence can be specified by choosing its elements (the order is determined by the assumption it is increasing). So we have $|A| = {2n \choose k}$.

For B, by symmetry we have $|B| = |A| = {2n \choose k}$.

For C, we are just looking at nk possible sequences, since there are n even numbers we can put in each position.

Inclusion-exclusion says that |A∪B∪C| = |A| + |B| + |C| - |A∩B| - |A∩C| - |B∩C| + |A∪B∪C|. It's not hard to see that A∩B = ∅ when k is at least 22, so we can reduce this to |A| + |B| + |C| - |A∩C| - |B∩C|. To count A∩C, observe that we are now looking at increasing sequences chosen from the n possible even numbers; so there are exactly ${n \choose k}$ of them, and similarly for B∩C. Summing up gives a total of

{2n \choose k} + {2n \choose k} + n^k - {n \choose k} - {n \choose k}
= 2 \left({2n \choose k} - {n \choose k}\right) + n^k

sequences satisfying at least one of the criteria.

Note that we had to assume k = 2 to get A∩B=∅, so this formula might require some adjustment for k<2. In fact we can observe immediately that the unique empty sequence for k=1 fits in all of A, B, and C, so in this case we get 1 winning sequence (which happens to be equal to the value in the formula, because here A∩B=∅ for other reasons), and for k=1 we get 2n winning sequences (which is less than the value 3n given by the formula).

To test that the formula works for at least some larger values, let n=3 and k=2. Then the formula predicts $2\left({6 \choose 2} - {3 \choose 2}\right) + 3^2 = 2(15 - 3) + 9 = 33$ total sequences.3 And here they are (generated by seqs.py run as python seqs.py 6 2):

[1, 2]
[1, 3]
[1, 4]
[1, 5]
[1, 6]
[2, 1]
[2, 2]
[2, 3]
[2, 4]
[2, 5]
[2, 6]
[3, 1]
[3, 2]
[3, 4]
[3, 5]
[3, 6]
[4, 1]
[4, 2]
[4, 3]
[4, 4]
[4, 5]
[4, 6]
[5, 1]
[5, 2]
[5, 3]
[5, 4]
[5, 6]
[6, 1]
[6, 2]
[6, 3]
[6, 4]
[6, 5]
[6, 6]

5. Further reading

RosenBook does basic counting in chapter 5 and more advanced counting (including SolvingRecurrences and using GeneratingFunctions) in chapter 7. BiggsBook chapters 6 and 10 give a basic introduction to counting, with more esoteric topics in chapters 11 and 12. ConcreteMathematics has quite a bit on counting various things.


  1. Of course, just setting up a recurrence doesn't mean it's going to be easy to actually solve it. (1)

  2. It's even easier to assume that A∩B=∅ always, as I did in an earlier version of these notes and in a similar example in class on 2008-10-08, but for k=1 any sequence is both increasing and nonincreasing, since there are no pairs of adjacent elements in a 1-element sequence to violate the property. (2)

  3. Without looking at the list, can you say which 3 of the 62=36 possible length 2 sequences are missing? (3)

2014-06-17 11:58