# 1. Axiomatic set theory

Formal or **axiomatic set theory** is defined by a collection of axioms, which describe the behavior of its only predicate symbol, ∈, a mutated version of the Greek letter epsilon. The interpretation of x∈y is that x is a member of (also called an element of) y. There is also the symbol ∉ (is not an element of), where x ∉ y is defined to mean ¬(x∈y); and ∋ (contains), where y ∋ x is the same as x ∈ y. Any set contains zero or more elements, and has no other properties other than what elements it contains. Formally, this means that if two sets contain the same elements, they're the same set. To make this explicit, we have the

- Axiom of Extensionality
- ∀x ∀y (x = y ⇔ ∀z (z ∈ x ⇔ z ∈ y)).

Extensionality is the axiom that defines what a set *is*. The rest of the axioms tell us what possible sets exist. There are two main traditions for filling in the collection of sets that exist, naive set theory as described above and axiomatic set theory. Naive set theory essentially assumes the existence of any set that you can describe, either by listing its elements explicitly, or by giving a rule (i.e. a predicate) that tells whether a particular object is in the set or not. Because the naive approach quickly leads to paradoxes, hard-core logicians instead favor axiomatic set theory, in which no set is presumed to exist unless you can generate it using a specific list of axioms. Axiomatic set theory usually assumes that there are no objects in the universe except sets (so that all quantifiers apply only to sets); other useful mathematical objects like numbers or functions must be represented as sets before they can be used. Naive set theory is more generous about what can be included in a set, so that you can have, for example, a set of baseball players or planets or philosophical concepts without having to somehow reduce these objects to sets.

Despite the fear and trembling that axiomatic set theory inspires in some readers, the difference between axiomatic set theory and naive set theory is not great, and with one noteworthy exception naive set theory can be thought of as just a pretty front-end to axiomatic set theory. So for the rest these notes we will retain the approach of presenting set theory from a naive perspective, but bring in standard axioms as needed to justify the existence of particular sets we may encounter. The axiomatization we will use is the standard Zermelo-Fraenkel set theory with the Axiom of Choice, usually abbreviated ZFC.

## 1.1. Set notation and some simple sets

Just as in naive set theory, sets whose elements are known are written by listing the elements inside curly braces, e.g. { 1, 2, 3 }, { Groucho, Chico, Harpo, Zeppo, Karl }. The smallest set is the **empty set**, the set with no elements. This can be written either as { } or using the special symbol ∅ (which is actually a mutated Greek letter Phi). One of the ZFC axioms says that ∅ exists:

- Axiom of Existence
- ∃x ∀y y ∉ x.

With just the axiom of extensionality, it is possible that no sets at all exist; the axiom of existence gives us at least one set (but possibly no others).

## 1.2. Operations on sets

Given two objects x and y, there is a set { x, y } that contains both x and y but nothing else. The existence of this set is asserted by the

- Axiom of Pairing
- ∀x ∀y ∃z ∀q (q ∈ z ⇔ (q = x ∨ q = y)).

There's a general rule of thumb that any expression with more than three quantifiers is incomprehensible to human beings, but the way to interpret this one is that if you give me x and y, I can find z = { x, y }. The reason I know that z = { x, y } is that for any q, either (a) q = x and q is in z, (b) q = y and q is in z, or (c) q is not in z. This means that z contains only x and y and nothing else, as we wanted.

Note that pairing two identical objects yields a one-element set: { x, x } = { x }. This follows from extensionality: sets only know whether they contain something or not, and there is no notion of containing the same element more than once, or of containing multiple copies of an element. It also follows from extensionality that result of pairing two non-identical objects is an **unordered pair**: { x, y } = { y, x }, since the set only knows that it contains both x and y, but not which comes first. To encode an **ordered pair** (x, y) with a first element x and a second element y (which may even be equal to each other), the usual trick is to encode the ordering with some additional structure, e.g. (x, y) = { { x }, { x, y } }.

Once we have pairing, we can apply it repeatedly to generate an infinite collection of sets starting with { }: e.g. { { } , { } } = { { } }, { { }, { { } } }, { { { { { { } } } } } }, and so forth. But we can only generate sets with one or two elements. To get bigger sets, we need more powerful operations.

The **union** of two sets x and y is a set x ∪ y that contains every element that appears in either x or y: for example, { 1, 2, 3 } ∪ { 2, 4, 6 } = { 1, 2, 3, 4, 6 }. Union is to sets what OR is to predicates: we could define x ∪ y as the set z for which ∀q (q ∈ z ⇔ (q ∈ x ∨ q ∈ y). But the formal definition is a little more complicated. The reason is that sometimes we like to be able to take the union of more than two sets at once. Suppose we have a set S = { A_{1}, A_{2}, A_{3}, ... }; then the union of S, written ∪S, contains every object that appears in at least one of S's elements. This more general notion of union includes the pairwise version because of the Axiom of Pairing: we can always rewrite x ∪ y as ∪{x,y}. To assert that unions always exist, we have the

- Axiom of Union
- ∀x ∃y ∀z (z ∈ y ⇔ ∃q (z ∈ q ∧ q ∈ x)).

In reading the statement of the axiom, you should think of x as the set we are taking a union over, y as the union ∪x, z as some element that might or might not be in y, and q as an element of x that contains z and thus puts z in the union. It may help to think of this general union operation in terms of family trees: if the members of x are x's children, then the members of y = ∪x are x's grandchildren (and q stands in for the child that connects x and a particular grandchild z). Another way to think about it is that union erases one layer of braces: ∪{ { 1, 2 }, { 3, 4, { 5 } } = { 1, 2, 3, 4, { 5 } }.

(Exercise: show that our first characterization of x ∪ y is equivalent to the more general definition given the definition x ∪ y = ∪{x,y}.)

Between pairing and union, we can build sets of arbitrary (though finite) size: just build a tree of pairings, and then take unions to flatten them out. For example, { 1, 2, 3, 4 } = { 1, 2 } ∪ { 3, 4 }.

## 1.3. Subsets and power sets

A set x is a **subset** of a set y, written x ⊆ y, if every element of x is also an element of y. Formally,

- x ⊆ y ≡ ∀z (z ∈ x ⇒ z ∈ y).
- Warning
- Don't confuse ∈ ("is in", "is an element of", or "is a member of") with ⊆ ("is a subset of"). It's possible for a set A to be a subset of B without being an element of B. For example, { 0 } is a subset of { 0, 1 } but it's not an element of { 0, 1 }. It is, however, an element of { { 0 }, 1 }. On the other hand, { 0 } is not a subset of of { { 0 }, 1 }, because { 0 } has an element 0 that isn't one of the two elements { 0 } and 1 of { { 0 }, 1 }. Nested braces matter.

As seen in the definition, the predicate-logic analogue of x ⊆ y is implication: membership in x implies membership in y.

Given any set x, there is a set P(x), the **power set** of x, that has as its elements precisely all the subsets of x. The existence of the power set is asserted by the

- Axiom of Power Set
- ∀x ∃y ∀z (z ⊆ x ⇔ z ∈ y).

The power set of a finite set with n elements will have 2^{n} elements: for example, P { 1, 2 } = { { }, { 1 }, { 2 }, { 1, 2 } }. The power set of the empty set is not empty, as it contains the empty set itself: P { } = { { } }.

## 1.4. Set-builder notation

So far we have seen lots of ways to make big sets by combining small sets. There are also operations on sets for extracting particular subsets, or for transforming a set into a new set by mapping over its elements. These operations are written using **set-builder notation**, also called **set comprehension.** This is also the point at which naive set theory parts company with axiomatic set theory, and gets into trouble because of it.

We can construct a subset of a given set that contains only those elements that satisfy a given predicate. The notation for this is { x ∈ y | Px }, and means the set of elements of y that satisfy P. The existence of this subset is asserted by the

- Axiom Schema of Specification
- ∀x ∃y ∀z (z ∈ y ⇔ (z ∈ x ∧ Px)).

This is an "axiom schema" instead of an axiom, because it defines an infinite family of axioms, one for each choice of P.

Technical note: In applying Specification, you can't use a formula P that refers to y; otherwise you can construct the set y = { z∈x | z∉y }, which gives a contradiction.

Naive set theory adopts the simpler approach of getting rid of the parent set x, and allowing you to define a set { x | Px } to consist of all objects that satisfy a given predicate, e.g. { x | x is prime }. This is a safe operation provided that there is some set that contains all the values of x for which Px is true, or if there is some default "universe" (again a set) that the set comprehension is implicitly restricted to. But you can get into trouble if you try to build sets that are too big: Russell's paradox constructs the set S = { x | x ∉ x }, and asks if S is an element of S (either answer yields a contradiction). One of the reasons why ZFC is so careful not to allow you to define sets in arbitrary ways is to avoid precisely this paradox, which arises as soon as you can construct a set of all sets. Such a set of all sets does not exist in standard set theory, although there are variants that allow something like it, while avoiding Russell's Paradox using other restrictions. To avoid this problem, it's best to use the notation { x | Px } only if x is restricted to values that you know are elements of some set, e.g. integers or baseball players (when suitably encoded as sets).

Specification allows for some other specialized set operations. For example, the **intersection** x ∩ y of two sets x and y consists of all elements that appear in *both* x and y; it is to sets what AND is to predicates. It can be defined formally as { z ∈ x ∪ y | z ∈ x ∧ z ∈ y }; and like union there is a generalization ∩x which consists of all elements that appear in every element of x (definition left to the reader). A similar operation is **set difference**, where x - y (sometimes written x ∖ y) is defined as { z ∈ x | z ∉ y }. Given a particular universe U, the **complement** of a set x is U - x; this acts like negation. Note that the complement is not defined except in the context of some universe.

## 1.5. Proving things about sets

Before throwing in any more axioms, let's take a brief break to talk about strategies for proving statements about sets:

### 1.5.1. To prove that x is an element of A

Use the definition of membership in A and show that x satisfies it. Examples:

- Theorem
- 2 ∈ { x | ∃y x = 2y }.
- Proof
- Let y = 1, then x = 2y.
- Theorem
4 ∈ ({ x | ∃y x = 2y } ∩ { x | x > 1 }).

- Proof
From the definition of intersection, x is in ({ x | ∃y x = 2y } ∩ { x | x > 1 }) if and only if x is in both sets. In the case x = 4 we can show (a) 4 ∈ { x | ∃y x = 2y } because 4 = 2*2, and (b) 4 ∈ { x | x > 1 } because 4 > 1.

### 1.5.2. To prove A is a subset of B

Use the definition: A ⊆ B if and only if ∀x x∈A ⇒ x∈B. So pick some x and show that the implication holds, either directly or by showing every x is either not in A or in B.

- Theorem
- ∀A ∅⊆A.
- Proof
- Fix A. Expanding the definition gives ∀x x∈∅ ⇒ x∈A, which we will now prove. Choose some x. By definition of ∅, x∉∅. So x∈∅ is false and x∈∅ ⇒ x∈A is true.

Here's a somewhat more abbreviated proof that leaves out all the boilerplate stuff like "expand the defintiion of ..." and "fix x." This is closer to what a typical published mathematical proof might look like; the assumption is that the reader can fill in the details.

- Theorem
- { x | ∃y x = 4y } ⊆ { x | ∃y x = 2y }
- Proof
- Pick x and let y be such that x = 4y. Then x = 2(2y).

Here's a proof of a theorem about power sets. Keep an eye out for words like "fix" and "choose" that indicate we are diving inside universal quantifiers.

- Theorem
- ∀A∀B A ⊆ B ⇒ PA ⊆ PB.
- Proof
- Fix A and B. Recall PA = { C | C ⊆ A }. We'll show that any C in PA is also in PB. To do so, choose some C in PA. For any x∈C, x∈A and thus x∈B (since A ⊆ B). It follows that C ⊆ B. Since this holds for all C in PA, we have PA ⊆ PB.

### 1.5.3. To prove that A = B

This can be done directly by showing ∀x x∈A ⇔ x∈B and applying Extensionality, but it is often done by showing both A ⊆ B and B ⊆ A.

- Theorem
{ x ∈ ℕ | ∃y x = 2y ∧ x < 1 } = { 0 }.

- Proof
Call the set on the left-hand side A and the one on the right-hand side B. First we'll show A ⊆ B. Let x ∈ ℕ. Then either (a) x = 0 and thus x ∈ B, or (b) x > 0 and x ∉ A. In either case we have x ∈ A ⇒ x ∈ B and so A ⊆ B. Now we'll show B ⊆ A; observe that x ∈ B implies x = 0 (it's the only element), and we have both 0 = 2*0 and 0 < 1. So x ∈ B ⇒ x ∈ A and thus B ⊆ A.

## 1.6. Infinite sets

We now continue with the axioms. We need one more axiom to get all the usual sets we know and love; this is the **axiom of infinity**. Before presenting the axiom (which is a little hairy), it may help to talk a bit about how to define the natural numbers 0, 1, 2, ... in set theory.

Zero is easy: we'll represent 0 as ∅ (i.e., { }). For larger numbers n, we'll represent n as the n-element set { 0, 1, 2, 3, ..., n-1 }. Note that the elements are defined according to the same rule, so that we really have

- 0 = { }
- 1 = { 0 } = { { } }
- 2 = { 0, 1 } = { { }, { { } } }
- 3 = { 0, 1, 2 } = { { }, { { } }, { { }, { { } } } }

and so forth.

The natural numbers constructed in this way are known as **finite von Neumann ordinals**; see this web page for more details.

The nice thing about this representation is that many simple operations on numbers translate into simple operations on sets: n+1 = n ∪ { n }, n-1 = ∪n, n < m if and only if n ∈ m, min S = ∩S (when S ≠ ∅), and |n| = n. The not-so-nice part is that this representation of ℕ is even less efficient than the Peano-arithmetic represention 0, S0, SS0, SSS0, etc.: where the Peano representation of n takes n+1 symbols, the von Neumann representation of n takes 2^{n+1} symbols. (Von Neumann ordinals also have various strange properties that make them a bad example of general sets: for example, most sets *don't* have the property that x ∈ y ∈ z implies x ∈ z.)

Now that we can define individual natural numbers, what about ℕ, the set of *all* natural numbers? With the axioms we have so far, we can't prove that ℕ exists. The problem is that ℕ has infinitely many members: but all of our current set operations produce only finite sets (i.e., those with a number of elements equal to some natural number) starting from finite sets. (Exercise: check this.) So we'll add another axiom, which says that either ℕ or some superset of ℕ exists:

- Axiom of Infinity
- ∃x (∅ ∈ x ∧ ∀y (y ∈ x ⇒ y ∪ { y } ∈ x)).

Using the representation of natural numbers we just defined, the axiom of infinity says that there exists some set x that (a) contains 0, and (b) contains n+1 whenever it contains n. This implies (by induction!) that x contains *at least* all of the natural numbers. It may also contain other sets that don't represent natural numbers, but we can prune these out using Specification to get ℕ alone.

Of course, the other axioms still apply to our new infinite set, so we get a vast collection of other infinite sets as well. Some of these include such favorites as ℤ, the set of integers, which can be represented as { (x, y) | x ∈ ℕ ∧ y ∈ ℕ ∧ (x = 0 ∨ y = 0) }, with the interpretation that (x, 0) represents +x and (0, x) represents -x; ℚ, the set of rational numbers, which can be represented by ordered pairs of integers; and ℝ, the set of real numbers, where each x in ℝ is represented by the set of rationals that are less than or equal to x. Other structures can easily be built out of sets and ordered pairs (which are of course themselves just a particular kind of set).

## 1.7. Other axioms

The six axioms and one axiom schema given so far are 70% of ZFC. They are enough to construct all the standard sets used in mathematics. Two additional axioms and one additional axiom scheme are include in ZFC, in order to exclude certain ill-behaved sets that might otherwise exist, and include certain sets whose existence makes life more convenient for mathematicians. If you are a computer scientist, you will most likely never need to use these axioms unless you are doing serious theoretical mathematics, as pretty much everything that is directly relevant to computer science fits comfortably inside ℕ and its close relatives.

- Axiom of Foundation
- ∀x (x = ∅ ∨ ∃y (y ∈ x ∧ x ∩ y = ∅).

This rather technical axiom says that any nonempty set x has an element y that has no element in common with x. The reason it is included is that it implies (though the proof requires a bit of trickery) that you cannot have an "infinite descending chain"—a sequence of sets x_{1}, x_{2}, x_{3}, ..., such that x_{i+1} ∈ x_{i} for each i.^{1}> This means that though sets may be very wide, in the sense of containing a lot of elements, they cannot be nested too deeply; each occurrence of the empty set is wrapped in only finitely many layers of sets. It is worth noting that this is the only axiom whose purpose is showing that certain bad sets do *not* exist.

- Axiom Schema of Replacement
∀x (∀y ∀z

_{1}∀z_{2}(y ∈ x ∧ P(y,z_{1}) ∧ P(y,z_{2}) ⇒ z_{1}= z_{2}) ⇒ ∃q ∀z (z ∈ q ⇔ ∃y (y ∈ x ∧ P(y,z))).

Like Specification, Replacement defines an infinite collection of axioms, one for each predicate P.

This rather hideous-looking axiom schema actually says something quite simple. Suppose that you have a two-argument predicate P that acts like a function when applied to elements y of x: if there is any z that makes P(y,z) true, then there is only one of them. For each y, let f(y) be this unique z. Then Replacement lets you construct { f(y) | y ∈ x }.

Most of the time you do not need Replacement, because if the f(y) all fit inside some set that you already have (e.g. f(y) is always a natural number), then you can use Specification instead: { z ∈ ℕ | ∃y (y ∈ x ∧ P(y,z) }. Where Replacement is needed is to construct certain large infinite sets whose existence is not guaranteed otherwise.

Finally, we have one final axiom, which is generally accepted by mathematicians because it's convenient but is in a very real sense optional, in that the rest of the axioms are consistent with either it or its negation. It says that for any collection of sets, there is a function that picks out one element of each set in the collection.

- Axiom of Choice
- ∀x ∃f (f:x→∪x ∧ ∀y y ∈ x ⇒ f(y) ∈ y).

Another way of stating this is that given any collection x of sets, we can construct a new set by choosing one element from each set. This follows easily from Power Set and Specification for small enough sets, but for infinite sets it can cause odd results, and it is in fact impossible to prove that such a choice set always exists using only the other axioms. The most famous bizarre consequence of Choice is the Banach-Tarski_paradox, which lets you chop a sphere into five pieces and then reassemble the pieces—without any gaps or overlap—into two spheres identical to the first. The fact that most mathematicians use Choice despite such weird outcomes is a sign of just how useful the axiom can be in other contexts.

Here is the actual proof: Suppose there were such a sequence. Construct the set y = { x

_{i}} (this uses the Axiom Schema of Replacement). Since y is not empty, Foundation says that there is some element x_{j}of y such that x_{j}∩ y = ∅. But x_{j}and y both contain x_{j+1}, a contradiction. (1)