Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

Set theory is the dominant foundation for mathematics. The idea is that everything else in mathematics—numbers, functions, etc.—can be written in terms of sets, so that if you have a consistent description of how sets behave, then you have a consistent description of how everything built on top of them behaves. If predicate logic is the machine code of mathematics, set theory would be assembly language.

Contents

Naive set theory
Operations on sets
Axiomatic set theory
Cartesian products, relations, and functions
Constructing the universe
Sizes and arithmetic
Further reading

1. Naive set theory

Naive set theory is the informal version of set theory that corresponds to our intuitions about sets as unordered collections of objects (called elements) with no duplicates. A set can be written explicitly by listing its elements using curly braces:

{ } = the empty set ∅, which has no elements.
{ Moe, Curly, Larry } = the Three Stooges.
{ 0, 1, 2, ... } = ℕ, the natural numbers. Note that we are relying on the reader guessing correctly how to continue the sequence here.
{ { }, { 0 }, { 1 }, { 0, 1 }, { 0, 1, 2 }, 7 } = a set of sets of natural numbers, plus a stray natural number that is directly an element of the outer set.

Membership in a set is written using the ∈ symbol (pronounced "is an element of" or "is in"). So we can write Moe ∈ The Three Stooges or 4 ∈ ℕ. We can also write ∉ for "is not an element of", as in Moe ∉ ℕ.

A fundamental axiom in set theory is that the only distinguishing property of a set is its list of members: if two sets have the same members, they are the same set.

For nested sets like { { 1 } }, ∈ represents only direct membership: the set { { 1 } } only has one element, { 1 }, so 1 ∉ { { 1 } }. This can be confusing if you think of ∈ as representing the English "is in," because if I put my lunch in my lunchbox and put my lunchbox in my backpack, then my lunch is in my backpack. But my lunch is not an element of { { my lunch }, my textbook, my slingshot }. In general, ∈ is not transitive (see Relations)—it doesn't behave like < unless there is something very unusual about the set you are applying it to—and there is no particular notation for being a deeply-buried element of an element of an element (etc.) of some set.

In addition to listing the elements of a set explicitly, we can also define a set by set comprehension, where we give a rule for how to generate all of its elements. This is pretty much the only way to define an infinite set without relying on guessing, but can be used for sets of any size. Set comprehension is usually written using set-builder notation, as in the following examples:

{ x | x∈ℕ ∧ x > 1 ∧ ∀y∈ℕ∀z∈ℕ yz = x ⇒ y = 1 ∨ z = 1 } = the prime numbers.
{ 2x | x∈ℕ } = the even numbers.
{ x | x∈ℕ ∧ x < 12 } = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }.

Sometimes the original set that an element has to be drawn from is put on the left-hand side of the pipe:

{ n∈ℕ | ∃x,y,z∈ℕ x > 0 ∧ y > 0 ∧ xⁿ + yⁿ = zⁿ }. (This is a fancy name for the two-element set { 1, 2 }; see Fermat's Last Theorem.)

Using set comprehension, we can see that every set in naive set theory is equivalent to some predicate. Given a set S, the corresponding predicate is x∈S, and given a predicate P, the corresponding set is { x | Px }. But beware of Russell's paradox: what is { S | S∉S }?

2. Operations on sets

If we think of sets as representing predicates, each logical connective gives rise to a corresponding operation on sets:

A∪B = { x | x∈A ∨ x∈B }. The union of A and B.
A∩B = { x | x∈A ∧ x∈B }. The intersection of A and B.
A∖B = { x | x∈A ∧ x∉B }. The set difference of A and B.
A∆B = { x | x∈A ⊕ x∈B }. The symmetric difference of A and B.

(Of these, union and intersection are the most important in practice.)

Corresponding to implication is the notion of a subset:

A⊆B ("A is a subset of B") if and only if ∀x x∈A ⇒ x∈B.

Sometimes one says A is contained in B if A⊆B. This is one of two senses in which A can be "in" B—it is also possible that A is in fact an element of B (A∈B). For example, the set A = { 12 } is an element of the set B = { Moe, Larry, Curly, { 12 } } but A is not a subset of B, because A's element 12 is not an element of B. Usually we will try to reserve "is in" for ∈ and "is contained in" for ⊆, but it's safest to use the symbols to avoid any possibility of ambiguity.

Finally we have the set-theoretic equivalent of negation:

Ā = { x | x∉A }. The set Ā is known as the complement of A.

If we allow complements, we are necessarily working inside some fixed universe: the complement of the empty set contains all possible objects. This raises the issue of where the universe comes from. One approach is to assume that we've already fixed some universe that we understand (e.g. ℕ), but then we run into trouble if we want to work with different classes of objects at the same time. Modern set theory is defined in terms of a collection of axioms that allow us to construct, essentially from scratch, a universe big enough to hold all of mathematics without apparent contradictions while avoiding the paradoxes that may arise in naive set theory.

3. Axiomatic set theory

The problem with naive set theory is that unrestricted set comprehension is too strong, leading to contradictions. Axiomatic set theory fixes this problem by being more restrictive about what sets one can form. The axioms most commonly used are known as Zermelo-Fraenkel set theory with choice or ZFC. The page AxiomaticSetTheory covers these axioms in painful detail, but in practice you mostly just need to know what constructions you can get away with.

The short version is that you can construct sets by (a) listing their members, (b) taking the union of other sets, or (c) using some predicate to pick out elements or subsets of some set. The starting points for this process are the empty set ∅ and the set ℕ of all natural numbers (suitably encoded as sets). If you can't construct a set in this way (like the Russell's Paradox set), odds are that it isn't a set.

These properties follow from the more useful axioms of ZFC:

Extensionality: Any two sets with the same members are equal.
Existence: ∅ is a set.
Pairing: For any given list of sets x, y, z, ..., { x, y, z, ... } is a set. (Strictly speaking, pairing only gives the existence of { x, y }, and the more general result requires the next axiom as well.)
Union: For any given set of sets { x, y, z, ... }, the set x ∪ y ∪ z ∪ ... exists.
Power set: For any set S, the power set ℘(S) = { A | A ⊆ S } exists.
Specification: For any set S and any predicate P, the set { x∈S | P(x) } exists. This is called restricted comprehension. Limiting ourselves to constructing subsets of existing sets avoids Russell's Paradox, because we can't construct S = { x | x ∉ x }. Instead, we can try to construct S = { x∈T | x∉x }, but we'll find that S isn't an element of T, so it doesn't contain itself without creating a contradiction.
Infinity: ℕ exists, where ℕ is defined as the set containing ∅ and containing x∪{x} whenever it contains x. Here ∅ represents 0 and x∪{x} represents the successor operation. This effectively defines each number as the set of all smaller numbers, e.g. 3 = { 0, 1, 2 } = { ∅, { ∅ }, { ∅, { ∅ } } }. Without this axiom, we only get finite sets.

There are three other axioms that don't come up much in computer science:

Foundation: Every nonempty set A contains a set B with A∩B=∅. This rather technical axiom prevents various weird sets, such as sets that contain themselves or infinite descending chains A₀ ∋ A₁ ∋ A₂ ∋ ... .
Replacement: If S is a set, and R(x,y) is a predicate with the property that ∀x ∃!y R(x,y), then { y | ∃x∈S R(x,y) } is a set. Mostly used to construct astonishingly huge infinite sets.
Choice: For any set of nonempty sets S there is a function f that assigns to each x in S some f(x) ∈ x.

4. Cartesian products, relations, and functions

Sets are unordered: the set { a, b } is the same as the set { b, a }. Sometimes it is useful to consider ordered pairs (a, b), where we can tell which element comes first and which comes second. These can be encoded as sets using the rule (a, b) = { {a}, {a, b} }.

Given sets a and b, their Cartesian product a × b is the set { (x,y) | x ∈ a ∧ y ∈ b }, or in other words the set of all ordered pairs that can be constructed by taking the first element from a and the second from b. If a has n elements and b has m, then a × b has nm elements. For example, { 1, 2 } × { 3, 4 } = { (1,3), (1,4), (2,3), (2,4) }.

Because of the ordering, Cartesian product is not commutative in general. We usually have A×B ≠ B×A (exercise: when are they equal?).

The existence of the Cartesian product of any two sets can be proved using the axioms we already have: if (x,y) is defined as { {x}, {x,y} }, then ℘(a ∪ b) contains all the necessary sets {x} and {x,y}, and ℘℘(a ∪ b) contains all the pairs { {x}, {x,y} }. It also contains a lot of other sets we don't want, but we can get rid of them using Specification.

A subset of the Cartesian product of two sets is called a relation. An example would be the < relation on the natural numbers: { (0, 1), (0, 2), ...; (1, 2), (1, 3), ...; (2, 3), ... }. Just as sets can act like predicates of one argument (where Px corresponds to x ∈ P), relations act like predicates of two arguments. Relations are often written between their arguments, so xRy is shorthand for (x,y) ∈ R.

A special class of relations are functions. A function from a domain A to a codomain (or range) B is a relation on A and B (i.e., a subset of A × B) such that every element of A appears on the left-hand side of exactly one ordered pair. We write f: A⇒B as a short way of saying that f is a function from A to B, and for each x ∈ A write f(x) for the unique y ∈ B with (x,y) ∈ f. (Technically, knowing f alone does not tell you what the codomain is, since some elements of B may not show up at all; this can be fixed by representing a function as a pair (f,B), but it's generally not useful unless you are doing CategoryTheory.) Most of the time a function is specified by giving a rule for computing f(x), e.g. f(x) = x².

Functions let us define sequences of arbitrary length: for example, the infinite sequence x₀, x₁, x₂, ... of elements of some set A is represented by a function x:ℕ→A, while a shorter sequence (a₀, a₁, a₂) would be represented by a function a:{0,1,2}→A. In both cases the subscript takes the place of a function argument: we treat x_n as syntactic sugar for x(n). Finite sequences are often called tuples, and we think of the result of taking the Cartesian product of a finite number of sets A×B×C as a set of tuples (a,b,c) (even though the actual structure may be ((a,b),c) or (a,(b,c)) depending on which product operation we do first).

A function f:A→B that covers every element of B is called onto, surjective, or a surjection. If it maps distinct elements of A to distinct elements of B (i.e., if x≠y implies f(x)≠f(y)), it is called one-to-one, injective, or an injection. A function that is both surjective and injective is called a one-to-one correspondence, bijective, or a bijection. (The terms onto, one-to-one, and bijection are probably the most common, although injective and surjective are often used as well, as they avoid the confusion between one-to-one and one-to-one correspondence.) Any bijection f has an inverse f^-1; this is the function { (y,x) | (x,y) ∈ f }. Two functions f:A→B and g:B→C can be composed to give a composition g∘f; g∘f is a function from A to C defined by (g∘f)(x) = g(f(x)).

Bijections let us define the size of arbitrary sets without having some special means to count elements. We say two sets A and B have the same size or cardinality if there exists a bijection f:A↔B. We can also define |A| formally as the (unique) smallest ordinal B such that there exists a bijection f:A↔B. This is exactly what we do when we do counting: to know that there are 3 stooges, we count them off 0 → Moe, 1 → Larry, 2 → Curly, giving a bijection between the set of stooges and 3 = { 0, 1, 2 }.

More on functions and relations can be found on the pages Functions and Relations.

5. Constructing the universe

With power set, Cartesian product, the notion of a sequence, etc., we can construct all of the standard objects of mathematics. For example:

Integers: The integers are the set ℤ = { ..., -2, -1, 0, -1, 2, ... }. We represent each integer z as an ordered pair (x,y), where x=0 ∨ y=0; formally, ℤ = { (x,y) ∈ ℕ×ℕ | x=0 ∨ y=0 }. The interpretation of (x,y) is x-y; so positive integers z are represented as (z,0) while negative integers are represented as (0,-z). It's not hard to define addition, subtraction, multiplication, etc. using this representation.
Rationals: The rational numbers ℚ are all fractions of the form p/q where p is an integer, q is a natural number not equal to 0, and p and q have no common factors.
Reals: The real numbers ℝ can be defined in a number of ways, all of which turn out to be equivalent. The simplest to describe is that a real number x is represented by the set { y∈ℚ | y≤x }. Formally, we consider any subset of x of ℚ with the property y∈x ∧ z<y ⇒ z∈x to be a distinct real number (this is known as a Dedekind cut). Note that real numbers in this representation may be hard to write down.

We can also represent standard objects of computer science:

Deterministic finite state machines: A deterministic finite state machine is a tuple (Σ,Q,q₀,δ,Q_accept) where Σ is an alphabet (some finite set), Q is a state space (another finite set), q₀∈Q is an initial state, δ:Q×Σ→Q is a transition function specifying which state to move to when processing some symbol in Σ, and Q_accept⊆Q is the set of accepting states. If we represent symbols and states as natural numbers, the set of all deterministic finite state machines is then just a subset of ℘ℕ×℘ℕ×ℕ×(ℕ^ℕ×ℕ)×℘ℕ satisfying some consistency constraints.

6. Sizes and arithmetic

We can compute the size of a set by explicitly counting its elements; for example, |∅| = 0, | { Larry, Moe, Curly } | = 3, and | { x∈ℕ | x < 100 ∧ x is prime } | = 25. But sometimes it is easier to compute sizes by doing arithmetic. We can do this because many operations on sets correspond in a natural way to arithmetic operations on their sizes. (For much more on this, see HowToCount.)

Two sets A and B that have no elements in common are said to be disjoint; in set-theoretic notation, this means A∩B = ∅. In this case we have |A∪B| = |A|+|B|. The operation of disjoint union acts like addition for sets. For example, the disjoint union of 2-element set { 0, 1 } and the 3-element set { Wakko, Jakko, Dot } is the 5-element set { 0, 1, Wakko, Jakko, Dot }.

The size of a Cartesian product is obtained by multiplication: |A×B| = |A|⋅|B|. An example would be the product of the 2-element set { a, b } with the 3-element set { 0, 1, 2 }: this gives the 6-element set { (a,0), (a,1), (a,2), (b,0), (b,1), (b,2) }. Even though Cartesian product is not generally commutative, since ordinary natural number multiplication is, we always have |A×B| = |B×A|.

For power set, it is not hard to show that |℘(S)| = 2^|S|. This is a special case of the size of A^B, the set of all functions from B to A, which is |A|^|B|; for the power set we can encode P(S) using 2^S, where 2 is the special set {0,1}.

6.1. Infinite sets

For infinite sets, we take the above properties as definitions of addition, multiplication, and exponentiation of their sizes. The resulting system is known as cardinal arithmetic, and the sizes that sets (finite or infinite) might have are known as cardinal numbers.

The finite cardinal numbers are just the natural numbers: 0, 1, 2, 3, ... . The first infinite cardinal number is the size of the set of natural numbers, and is written as ℵ₀ ("aleph-zero," "aleph-null," or "aleph-nought"). The next infinite cardinal number is ℵ₁ ("aleph-one"): it might or might not be the size of the set of real numbers, depending on whether you include the Generalized Continuum Hypothesis in your axiom system or not.

Infinite cardinals can behave very strangely. For example:

ℵ₀+ℵ₀=ℵ₀. In other words, it is possible to have two sets A and B that both have the same size as ℕ, take their disjoint union, and get another set A+B that has the same size as ℕ. To give a specific example, let A = { 2x | x∈ℕ } and B = { 2x+1 | x∈ℕ }. These have |A|=|B|=|ℕ| because there is a bijection between each of them and ℕ built directly into their definitions. It's also not hard to see that A and B are disjoint, and A∪B = ℕ. So |A|=|B|=|A|+|B| in this case.
ℵ₀⋅ℵ₀=ℵ₀. Example: A bijection between ℕ×ℕ and ℕ using the Cantor pairing function <x,y> = (x+y+1)(x+y)/2 + y. The first few values of this are <0,0> = 0, <1,0> = 2⋅1/2+0 = 1, <0,1> = 2⋅1/2+1 = 1, <2,0> = 3⋅2/2 + 0 = 3, <1,1> = 3⋅2/2 + 1 = 4, <0,2> = 3⋅2/2 + 2 = 5, etc. The basic idea is to order all the pairs by increasing x+y, and then order pairs with the same value of x+y by increasing y; eventually every pair is reached.
ℕ^* = { all finite sequences of elements of ℕ } has size ℵ₀. One way to do this to define a function recursively by setting f([]) = 0 and f([first, rest]) = 1+<first,f(rest)>, where first is the first element of the sequence and rest is all the other elements. In class, we did the example f(0,1,2) = 1+<0,f(1,2)> = 1+<0,1+<1,f(2)>> = 1+<0,1+<1,1+<2,0>>> = 1+<0,1+<1,1+3>> = 1+<0,1+<1,4>> = 1+<0,1+19> = 1+<0,20> = 1+230 = 231. This assigns a unique element of ℕ to each finite sequence, which is enough to show |ℕ^*| ≤ |ℕ|; in fact, with some effort one can show that f is a bijection.

6.2. Countable sets

All of these sets have the property of being countable, which means that they can be put into a bijection with ℕ or one of its subsets. The general principle is that any sum or product of infinite cardinal numbers turns into taking the maximum of its arguments. The last case implies that anything you can write down using finitely many symbols (even if they are drawn from an infinite but countable alphabet) is countable. This has a lot of applications in computer science: one of them is that the set of all computer programs in any particular programming language is countable.

6.3. Uncountable sets

Exponentiation is different. We can easily show that 2^ℵ₀ ≠ ℵ₀, or equivalently that there is no bijection between ℘ℕ and ℕ. This is done using Cantor's diagonalization argument.

Theorem: Let S be any set. Then there is no surjection f:S→℘S.
Proof: Let f:S→℘S be some function from S to subsets of S. We'll construct a subset of S that f misses. Let A = { x∈S | x∉f(x) }. Suppose A = f(y). Then y∈A ↔ y∉A, a contradiction. (Exercise: Why does A exist even though the Russell's Paradox set doesn't?)

Since any bijection is also a surjection, this means that there's no bijection between S and ℘S either, implying, for example, that |ℕ| is strictly less than |℘ℕ|.

(On the other hand, it is the case that |ℕ^ℕ| = |2^ℕ|, so things are still weird up here.)

Sets that are larger than ℕ are called uncountable. A quick way to show that there is no surjection from A to B is to show that A is countable but B is uncountable. For example:

Corollary: There are functions f:ℕ→{0,1} that are not computed by any computer program.
Proof: Let P be the set of all computer programs that take a natural number as input and always produce 0 or 1 as output (assume some fixed language), and for each program p ∈ P, let f_p be the function that p computes. We've already argued that P is countable (each program is a finite sequence drawn from a countable alphabet), and since the set of all functions f:ℕ→{0,1} = 2^ℕ has the same size as ℘ℕ, it's uncountable. So some f gets missed: there is at least one function from ℕ to {0,1} that is not equal to f_p for any program p.

The fact that there are more functions from ℕ to ℕ than there are elements of ℕ is one of the reasons why set theory (slogan: "everything is a set") beat out lambda calculus (slogan: "everything is a function from functions to functions") in the battle over the foundations of mathematics. And this is why we do set theory in CS202 and lambda calculus gets put in CS201.

7. Further reading

See RosenBook §§2.1–2.2, BiggsBook Chapter 2, or Naive set theory.

CategoryMathNotes