Contents

# 1. What counting is

Recall that in SetTheory we formally defined each natural number as the set of all smaller natural numbers, so that n = { 0, 1, 2, ..., n-1 }. Call a set **finite** if it can be put in one-to-one correspondence with some natural number n. For example, the set S = { Larry, Moe, Curly } is finite because we can map Larry to 0, Moe to 1, and Curly to 2, and get a bijection between S and the set 3 = { 0, 1, 2 }. The **size** or **cardinality** of a finite set S, written |S| or #S, is just the natural number it can be put in one-to-one correspondence with; that this is **well-defined** (gives a unique size for each finite set) follows from the **Pigeonhole Principle** (see below). In general, a **cardinal number** is a representative of some class of sets that can all put connected to each other by bijections; these include infinite cardinal numbers, discussed below.

Usually we will not provide an explicit bijection to compute the size of a set, but instead will rely on standard counting principles (see below). A proof that a set has a certain size (or that two sets have the same size) that does provide an explicit bijection is called a **combinatorial proof**. For sets of complicated objects, such proofs often provide additional insight into the structure of the objects.

## 1.1. Countable and uncountable sets

A set S is **infinite** if there is a bijection between S and some proper subset of S (**proper subset** means a subset that isn't equal to S), or equivalently if it can't be put in one-to-one correspondence with any natural number. One of the smallest infinite sets is just the set of all naturals ℕ. Any infinite set also has a cardinality; as for finite sets, the cardinality of an infinite set is defined based on what other sets it can be put into one-to-one correspondence with, and different infinite sets may have different cardinalities.

The smallest cardinality is ℵ_{0} (pronounced aleph-nought), the cardinality of ℕ. Any set S for which there exists a bijection f:S↔ℕ has cardinality ℵ_{0}. Examples include the set of all pairs of natural numbers (which can be mapped to single natural numbers using various encodings), the set of integers ℤ, the set of rationals ℚ, the set of all finite sequences of natural numbers (using more sophisticated encodings), the set of all computer programs (represent them as sequences of natural numbers, e.g., using ASCII encoding), the set of all possible finite-length texts in any language with a countable output, or the set of all mathematical objects that can be explicitly defined (each such definition is a finite text).

Sets with cardinality ℵ_{0} or less are called **countable**; sets with cardinality exactly ℵ_{0} are **countably infinite**.

That there are larger cardinalities is a consequence of a famous proof due to Georg Cantor, the **diagonalization argument**:

- Theorem
- Let S be any set. Then there is no surjection f:S→℘S.
- Proof
- Let f:S→℘S. We will show that f is not surjective, by constructing a subset A of S such that A≠f(x) for any x in S. Let A = { x | x∉f(x) }. Now choose some x and consider f(x). We have x∈A if and only if x∉f(x); so x is in exactly one of A and f(x), and in particular A≠f(x). It follows that A≠f(x) for all x, and thus that A isn't in the range of f.

Since any bijection is also a surjection, this means that |S| ≠ |℘S|. For finite sets, this is pretty easy to show directly: it just says that n<2^{n} for all natural numbers n. For infinite sets, it means, for example, that |ℕ| ≠ |℘ℕ|, or equivalently that |℘ℕ| > ℵ_{0}. This means that ℘ℕ is **uncountable**.

The next largest cardinal after ℵ_{0} is called ℵ_{1} (aleph-one). The standard axioms of SetTheory provably can't tell us whether |℘ℕ| = ℵ_{1} or not, but if we assume the Continuum hypothesis we can assert that |℘ℕ| = ℵ_{1}. In this case ℵ_{1} not only gives the cardinality of the power set of the naturals, but also the cardinality of the set of real numbers ℝ, the set of complex numbers ℂ, the set of finite sequences of reals, etc. It does not give the cardinality of ℘ℝ or the set of functions f:ℝ→ℝ; these live (assuming a strengthened version of the Continuum hypothesis) up at ℵ_{2}.

Because Cantor's theorem applies even to these bigger sets, there is an infinite sequence of increasingly large cardinalities ℵ_{0}, ℵ_{1}, ℵ_{2}, ℵ_{3}, ... . Pretty much anything past about ℵ_{1} falls into the category of *incomprehensibly large* (not an actual mathematical term).

# 2. Basic counting techniques

Our goal here is to compute the size of some set of objects, e.g. the number of subsets of a set of size n, the number of ways to put k cats into n boxes so that no box gets more than one cat, etc.

In rare cases we can use the definition of the size of a set directly, by constructing a bijection between the set we care about and some natural number. For example, the set S_{n} = { x ∈ ℕ | x < n^{2} /\ ∃y: x = y^{2} } has exactly n members, because we can generate it by applying the one-to-one correspondence f(y) = y^{2} to the set { 0, 1, 2, 3, ..., n-1 } = n. But most of the time constructing an explicit one-to-one correspondence is too time-consuming or too hard, so having a few lemmas around that tell us what the size of a set will be can be handy.

## 2.1. Reducing to a previously-solved case

If we can produce a bijection between a set A whose size we don't know and a set B whose size we do, then we get |A|=|B|. Pretty much all of our proofs of cardinality will end up looking like this.

## 2.2. Showing |A| ≤ |B| and |B| ≤ |A|

We write |A| ≤ |B| if there is an injection f:A→B, and similarly |B| ≤ |A| if there is an injection g:B→A. If both conditions hold, then there is a bijection between A and B, showing |A| = |B|. This fact is trivial for finite sets (it is a consequence of the Pigeonhole Principle), but for infinite sets—even though it is still true—the actual construction of the bijection is a little trickier. See Cantor-Bernstein-Schroeder theorem if you really want the gory details.

Similarly, if we write |A| ≥ |B| to indicate that there is a surjection from A to B, then |A| ≥ |B| and |B| ≥ |A| implies |A| = |B|. The easiest way to show this is to observe that if there is a surjection f:A→B, then we can get an injection f':B→A by letting f'(y) be any element of { x | f(x) = y } (this requires the Axiom of Choice, but pretty much everybody assumes the Axiom of Choice).

Examples:

|ℚ| = |ℕ|. Proof: |ℕ| ≤ |ℚ| because we can map any n in ℕ to the same value in ℚ; this is clearly an injection. To show |ℚ| ≤ |ℕ|, observe that we can encode any element ±p/q of ℚ, where p and q are both natural numbers, as a triple <sign, p, q> where sign ∈ {0,1} indicates + (0) or - (1); this encoding is clearly injective. Then use the bijection from ℕ×ℕ to ℕ twice to crunch this triple down to a single natural number, getting an injection from ℚ to ℕ.

## 2.3. Sum rule

The **sum rule** says

- If A∩B=Ø, then |A∪B| = |A| + |B|.

Proof: Let f:A→|A| and g:B→|B| be bijections. Define h:A∪B→(|A|+|B|) by the rule h(x) = f(x) for x∈A, h(x) = |A|+g(x) for x∈B. We need to show that h is a bijection; we will do so by first showing that it is injective, then that it is surjective. Let x and y be distinct elements of A∪B. If x and y are both in A, then h(x) = f(x) ≠ f(y) = h(y); similarly if x and y are both in B, h(x) = |A|+g(x) ≠ |A| + g(y) = h(y). If x is in A and y in B, then h(x) = f(x) < |A|, and h(y) = g(y) + |A| ≥ |A|; it follows that h(x) ≠ h(y). The remaining case is symmetric. To show h is surjective, let m∈(|A|+|B|). If m < |A|, there exists some x∈A such that h(x) = f(x) = m. Otherwise, we have |A| ≤ m < |A|+|B|, so 0 ≤ m - |A| < |B|, and there exists some y∈B such that g(y) = m-|A|. But then h(y) = g(y)+|A| = m-|A|+|A| = m.

Generalizations: If A_{1}, A_{2}, A_{3} ... A_{k} are **pairwise disjoint** (i.e. A_{i} ∩ A_{j} = Ø for all i ≠j), then

The proof is by induction on k.

### 2.3.1. Examples

- As I was going to Saint Ives, I met a man with 7 wives, 28 children, 56 grandchildren, and 122 great-grandchildren. Assuming these sets do not overlap, how many people did I meet? Answer: 1+7+28+56+122=214.

### 2.3.2. For infinite sets

The sum rule works for infinite sets, too; technically, the sum rule is used to *define* |A|+|B| as |A∪B| when A and B are disjoint. This makes cardinal arithmetic a bit wonky: if at least one of A and B is infinite, then |A| + |B| = max(|A|,|B|), since we can space out the elements of the larger of A and B and shoehorn the other into the gaps.

### 2.3.3. The Pigeonhole Principle

A consequence of the sum rule is that if A and B are both finite and |A| > |B|, you can't have an injection from A to B. The proof is by contraposition. Suppose f:A→B is an injection. Write A as the union of f^{-1}(x) for each x∈B, where f^{-1}(x) is the set of y in A that map to x. Because each f^{-1}(x) is disjoint, the sum rule applies; but because f is an injection there is at most one element in each f^{-1}(x). It follows that . (Question: Why doesn't this work for infinite sets?)

The Pigeonhole Principle generalizes in an obvious way to functions with larger domains; if f:A→B, then there is some x in B such that |f^{-1}(x)| ≥ |A|/|B|.

## 2.4. Inclusion-exclusion (with two sets)

What if A and B are not disjoint, i.e., if A∩B is not Ø? In this case adding |A| to |B| will count any element that appears in both sets twice. We can get the size of |A∪B| by subtracting off the overcount, obtaining this formula, which works for all A and B:

- |A∪B| = |A| + |B| - |A∩B|

To prove that the formula works, we use the sum rule. Observe that A = (A∩B) ∪ (B-A) and that the union in this case is disjoint (recall that A-B is the set of all elements that are in A but not in B). A similar decomposition works for B, so we have

- |A| + |B| = |A∩B| + |B-A| + |B∩A| + |A-B| = |A∪B| + |A∩B|.

Here we are again using the sum rule: A∪B is the disjoint union of A-B, B-A, and A∩B. Subtracting the |A∩B| term from both sides gives the formula we originally wanted.

This is a special case of the inclusion-exclusion formula, which can be used to compute the size of the union of many sets using the size of pairwise, triple-wise, etc. intersections of the sets.

### 2.4.1. For infinite sets

Subtraction doesn't work very well for infinite quantities (while ℵ_{0}+ℵ_{0}=ℵ_{0}, that doesn't mean ℵ_{0}=0). So the closest we can get to the inclusion-exclusion formula is that |A| + |B| = |A∪B| + |A∩B|. If at least one of A or B is infinite, then |A∪B| is also infinite, and since |A∩B| ≤ |A∪B| we have |A∪B| + |A∩B| = |A∪B| by the usual but bizarre rules of cardinal arithmetic. So for infinite sets we have the rather odd result that |A∪B| = |A| + |B| = max(|A|,|B|) whether the sets overlap or not.

### 2.4.2. Combinatorial proof

We can prove |A| + |B| = |A∪B| + |A∩B| combinatorially, by turning both sides of the equation into disjoint unions (so the sum rule works) and then providing an explicit bijection between the resulting sets. The trick is that we can always force a union to be disjoint by tagging the elements with extra information; so on the left-hand side we construct L = {1}×A ∪ {2}×B, and on the right-hand side we construct R = {1}×(A∪B) ∪ {2}×(A∩B). It is easy to see that both unions are disjoint, because we are always taking the union of a set of ordered pairs that start with 1 with a set of ordered pairs that start with 2, and no ordered pair can start with both tags; it follows that |L| = |A| + |B| and |R| = |A∪B| + |A∩B|. Now define the function f:L→R by the rule

- f((1,x)) = (1,x).
- f((2,x)) = (2,x) if x∈B∩A
- f((2,x)) = (1,x) if x∈B\A.

Observe that f is surjective, because for any (1,x) in {1}×(A∪B), either x is in A and (1,x) = f((1,x)) where (1,x) ∈ L, or x is in B\A and (1,x) = f((2,x)) where (2,x) ∈ L. It is also true that f is injective; the only way for it not to be is if f((1,x)) = f((2,x)) = (1,x) for some x. Suppose this occurs. Then x ∈ A (because of the 1 tag) and x ∈ B\A (because (2,x) is only mapped to (1,x) if x∈B\A). But x can't be in both A and B\A, so we get a contradiction.

## 2.5. Product rule

The **product rule** says that for *any* sets A and B

- |A×B| = |A|·|B|.

Recall that A×B is the Cartesian product of A and B, the set of all ordered pairs whose first element comes from A and whose second comes from B.

Proof (for finite sets): Let f:A→|A| and g:B→|B| be bijections. Construct a new function h:|A×B|→(|A|·|B|) by the rule h(<a,b>) = a·|B| + b. Showing that h is a bijection is left as an exercise to the reader (hint: use the DivisionAlgorithm).

The general form is

where the product on the left is a Cartesian product and the product on the right is an ordinary integer product.

### 2.5.1. Examples

As I was going to Saint Ives, I met a man with seven sacks, and every sack had seven cats. How many cats total? Answer: Label the sacks 0,1,2,...,6, and label the cats in each sack 0,1,2,...,6. Then each cat can be specified uniquely by giving a pair <sack number, cat number>, giving a bijection between the set of cats and the set 7×7. Since the |7×7|=7·7=49, we have 49 cats.

Dr Frankenstein's trusty assistant Igor has brought him 6 torsos, 4 brains, 8 pairs of matching arms, and 4 pairs of legs. How many different monsters can Dr Frankenstein build? Answer: there is a one-to-one correspondence between possible monsters and 4-tuples of the form <torso, brain, pair of arms, pair of legs>; the set of such 4-tuples has 6·4·8·4=728 members.

How many different ways can you order n items? Call this quantity n! (i.e., n factorial). With 0 or 1 items, there is only one way; so we have 0!=1!=1. For n > 1, there are n choices for the first item, leaving n-1 items to be ordered. From the product rule we thus have n! = n·(n-1)!, which we could also expand out as

As a generalization of the previous example, we can count the number of ways P(n,k) we can pick an ordered subset of k of n items without replacement, also known as picking a k-**permutation**. There are n ways to pick the first item, n-1 to pick the second, and so forth, giving a total of

such k-permutations by the product rule.

Among combinatorialists, the notation (n)_{k} (pronounced "n lower-factorial k") is more common than P(n,k) for n·(n-1)·(n-2)·...·(n-k+1). As an extreme case we have (n)_{n} = n·(n-1)·(n-2)·...·(n-n+1) = n·(n-1)·(n-2)·...·1 = n!.

### 2.5.2. For infinite sets

The product rule also works for infinite sets, becasue we again use it as a definition: for any A and B, |A|⋅|B| is defined to be |A×B|. One oddity for infinite sets is that this definition gives |A|⋅|B| = |A|+|B| = max(|A|,|B|), because if at least one of A and B is infinite, it is possible to construct a bijection between A×B and the larger of A and B. (Infinite sets are strange.)

## 2.6. Exponent rule

Given sets A and B, let A^{B} be the set of functions f:B→A. Then |A^{B}| = |A|^{|B|}.

If |B| is finite, this is just a |B|-fold application of the product rule: we can write any function f:B→A as a sequence of length |B| that gives the value in A for each input in B. Since each element of the sequence contributes |A| possible choices, we get |A|^|B| choices total.

For infinite sets, the exponent rule is a definition of |A|^{|B|}. The behavior of exponentiation for infinite cardinals is very strange indeed; see here for some properties of cardinal exponentiation if you are really interested.

To give a flavor of how exponentiation works for arbitrary sets, here's a combinatorial proof of the usual arithmetic fact that x^{a}x^{b} = x^{a+b}, for any cardinal numbers x, a, and b. Let x = |X| and let a = |A| and b = |B| where A and B are disjoint (we can always use the tagging trick that we used for inclusion-exclusion to make A and B be disjoint). Then x^{a}x^{b} = |X^{A}×X^{B}| and x^{a+b} = |X^{A∪B}|. We will now construct an explicit bijection f:X^{A∪B}→X^{A}×X^{B}. The input to f is a function g:A∪B→X; the output is a pair of functions (g_{A}:A→X,g_{B}:B→X). We define g_{A} by g_{A}(x) = g(x) for all x in A (this makes g_{A} the **restriction** of g to A, usually written as g|A or g⇂A); similarly g_{B} = g|B. This is easily seen to be a bijection; if g = h, then f(g) = (g|A,g|B) = f(h) = (h|A,h|B), and if g≠h there is some x for which g(x)≠h(x), implying g|A≠h|A (if x is in A) or g|B≠h|B (if x is in B).

## 2.7. Counting the same thing in two different ways

An old farm joke:

- Q: How do you count a herd of cattle? A: Count the legs and divide by four.

Sometimes we can compute the size of a set S by using it (as an unknown variable) to compute the size of another set T (as a function of |S|), and then using some other way to count T to find its size, finally solving for |S|. This is known as **counting two ways** and is surprisingly useful when it works. We will assume that all the sets we are dealing with are finite, so we can expect things like subtraction and division to work properly.

Example: Let S be an n-element set, and consider the set S_{k} = { A⊆S | |A| = k }. What is |S_{k}|? Answer: First we'll count the number m of sequences of k elements of S with no repetitions. We can get such a sequence in two ways:

By picking a size-k subset A and then choosing one of k! ways to order the elements. This gives m = |S

_{k}|·k!.By choosing the first element in one of n ways, the second in one of n-1, the third in one of n-2 ways, and so on until the k-th element, which can be chosen in one of n-k+1 ways. This gives m = (n)

_{k}= n·(n-1)·(n-2)·...(n-k+1), which can be written as n!/(n-k)!. (Here we are using the factors in (n-k)! to cancel out the factors in n! that we don't want.)

So we have m = |S_{k}|·k! = n!/(n-k)!, from which we get

This quantity turns out to be so useful that it has a special notation:

where the left-hand side is known as a **binomial coefficient** and is pronounced "n choose k." We discuss BinomialCoefficients at length on their own page. The secret of why it's called a binomial coefficient will be revealed when we talk about GeneratingFunctions.

Example: Here's a generalization of binomial coefficients: let the **multinomial coefficient**

be the number of different ways to distribute n items among k bins where the i-th bin gets exactly n_{i} of the items and we don't care what order the items appear in each bin. (Obviously this only makes sense if n_{1}+n_{2}+...+n_{k}=n.) Can we find a simple formula for the multinomial coefficient?

Here are two ways to count the number of permutations of the n-element set:

- Pick the first element, then the second, etc. to get n! permuations.
- Generate a permutation in three steps:
Pick a partition of the n elements into groups of size n

_{1}, n_{2}, ... n_{k}.- Order the elements of each group.
- Paste the groups together into a single ordered list.

There are

ways to pick the partition and

ways to order the elements of all the groups, so we have

which we can solve to get

This also gives another way to derive the formula for a binomial coefficient, since

# 3. Applying the rules

If you're given some strange set to count, look at the structure of its description:

If it's given by a rule of the form x is in S if either P(x) or Q(x) is true, use the sum rule (if P and Q are mutually exclusive) or inclusion-exclusion. This includes sets given by recursive definitions, e.g. x is a tree of depth at most k if it is either (a) a single leaf node (provided k > 0) or (b) a root node with two subtrees of depth at most k-1. The two classes are disjoint so we have T(k) = 1 + T(k-1)

^{2}with T(0) = 0.^{1}For objects made out of many small components or resulting from many small decisions, try to reduce the description of the object to something previously known, e.g. (a) a word of length k of letters from an alphabet of size n allowing repetition (there are n

^{k}of them, by the product rule); (b) a word of length k not allowing repetition (there are (n)_{k}of them—or n! if n = k); (c) a subset of k distinct things from a set of size n, where we don't care about the order (there are of them); any subset of a set of n things (there are 2^{n}of them—this is a special case of (a), where the alphabet encodes non-membership as 0 and membership as 1, and the position in the word specifies the element). Some examples:The number of games of Tic-Tac-Toe assuming both players keep playing until the board is filled is obtained by observing that each such game can be specified by listing which of the 9 squares are filled in order, giving 9! = 362,880 distinct games. Note that we don't have to worry about which of the 9 moves are made by X and which by O, since the rules of the game enforce it. (If we only consider games that end when one player wins, this doesn't work: probably the easiest way to count such games is to send a computer off to generate all of them. This horrible program says there are 255168 possible games and 958 distinct final positions.)

The number of completely-filled-in Tic-Tac-Toe boards can be obtained by observing that any such board has 5 X's and 4 O's. So there are = 126 such positions. (Question: Why would this be smaller than the actual number of final positions?)

Sometimes reducing to a previous case requires creativity. For example, suppose you win n identical cars on a game show and want to divide them among your k greedy relatives. Assuming that you don't care about fairness, how many ways are there to do this?

If it's ok if some people don't get a car at all, then you can imagine putting n cars and k-1 dividers in a line, where relative 1 gets all the cars up to the first divider, relative 2 gets all the cars between the first and second dividers, and so forth up to relative k who gets all the cars after the (k-1)th divider. Assume that each car—and each divider—takes one parking space. Then you have n+k-1 parking spaces with k-1 dividers in them (and cars in the rest). There are exactly ways to do this.

Alternatively, suppose each relative demands at least 1 car. Then you can just hand out one car to each relative to start with, leaving n-k cars to divide as in the previous case. There are ways to do this.

Finding such correspondences is a central part of enumerative combinatorics, the branch of mathematics that deals with counting things.

# 4. An elaborate counting problem

Suppose you have the numbers { 1, 2, .., 2n }, and you want to count how many sequences of k of these numbers you can have that are (a) increasing (a[i] < a[i+1] for all i), (b) decreasing (a[i] ≥ a[i+1] for all i), or (c) made up only of even numbers.

This is the union of three sets A, B, and C, corresponding to the three cases. The first step is to count each set individually; then we can start thinking about applying inclusion-exclusion to get the size of the union.

For A, any increasing sequence can be specified by choosing its elements (the order is determined by the assumption it is increasing). So we have .

For B, by symmetry we have .

For C, we are just looking at n^{k} possible sequences, since there are n even numbers we can put in each position.

Inclusion-exclusion says that |A∪B∪C| = |A| + |B| + |C| - |A∩B| - |A∩C| - |B∩C| + |A∪B∪C|. It's not hard to see that A∩B = ∅ when k is at least 2^{2}, so we can reduce this to |A| + |B| + |C| - |A∩C| - |B∩C|. To count A∩C, observe that we are now looking at increasing sequences chosen from the n possible even numbers; so there are exactly of them, and similarly for B∩C. Summing up gives a total of

sequences satisfying at least one of the criteria.

Note that we had to assume k = 2 to get A∩B=∅, so this formula might require some adjustment for k<2. In fact we can observe immediately that the unique empty sequence for k=1 fits in all of A, B, and C, so in this case we get 1 winning sequence (which happens to be equal to the value in the formula, because here A∩B=∅ for other reasons), and for k=1 we get 2n winning sequences (which is less than the value 3n given by the formula).

To test that the formula works for at least some larger values, let n=3 and k=2. Then the formula predicts total sequences.^{3} And here they are (generated by seqs.py run as `python seqs.py 6 2`):

[1, 2] [1, 3] [1, 4] [1, 5] [1, 6] [2, 1] [2, 2] [2, 3] [2, 4] [2, 5] [2, 6] [3, 1] [3, 2] [3, 4] [3, 5] [3, 6] [4, 1] [4, 2] [4, 3] [4, 4] [4, 5] [4, 6] [5, 1] [5, 2] [5, 3] [5, 4] [5, 6] [6, 1] [6, 2] [6, 3] [6, 4] [6, 5] [6, 6]

# 5. Further reading

RosenBook does basic counting in chapter 5 and more advanced counting (including SolvingRecurrences and using GeneratingFunctions) in chapter 7. BiggsBook chapters 6 and 10 give a basic introduction to counting, with more esoteric topics in chapters 11 and 12. ConcreteMathematics has quite a bit on counting various things.

Of course, just setting up a recurrence doesn't mean it's going to be easy to actually solve it. (1)

It's even easier to assume that A∩B=∅ always, as I did in an earlier version of these notes and in a similar example in class on 2008-10-08, but for k=1 any sequence is both increasing and nonincreasing, since there are no pairs of adjacent elements in a 1-element sequence to violate the property. (2)

Without looking at the list, can you say which 3 of the 6

^{2}=36 possible length 2 sequences are missing? (3)