Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

The P vs NP problem is the most famous open problem in Computer Science (specifically in ComputationalComplexityTheory), and one of the most famous outstanding problems in mathematics in general. We don't know the answer to this problem, but if we assume that what appears to be the likely answer is correct, we can use it to prove LowerBounds.

1. The classes P and NP

See ComputationalComplexityTheory for definitions of languages, complexity classes, etc. The definitions of the complexity classes P and NP are:

P is the set of languages L such that there exists a polynomial-time algorithm M such that M(x) = 1 if and only if x is in L.
NP is the set of languages L such that there exists a polynomial-time algorithm M such that if x is in L, there exists y with size polynomial in x such that M(x,y) = 1, but if x is not in L, M(x,y) = 0 for all y. Such a machine is called a nondeterministic machine and NP is an abbreviation for nondeterministic polynomial time.

Intuitively, a problem is in NP if there is a verifiable proof that the answer is yes for any yes instance of the problem. Examples of problems in NP include:

COMPOSITE = { x : x = ab for some integers a and b }. The machine M(x,y) returns true if (a) y is a pair of integers a and b; and (b) x = ab. (This language is also in P).
CIRCUIT SATISFIABILITY { x : x is the description of a Boolean circuit that outputs 1 for some input }. Now the machine M(x,y) simply evaluates circuit x on input y.
SATISFIABILITY: like CIRCUIT SATISFIABILITY but x is Boolean formula in conjunctive normal form (CNF), which means that it consists of a (possibly very large) AND of many ORs of inputs and their negations.
3SAT: like satisfiability, but each OR gate has exactly 3 inputs.
CLIQUE = { (G,k) : G is the description of a graph that contains a clique of k vertices, where a clique is a set of vertices that are all adjacent to each other }.
INDEPENDENT SET = { (G,k) : G is the description of a graph that contains an independent set of k vertices, where an independent set is a set of vertices none of which are adjacent to each other }.
VERTEX COVER = { (G,k) : G is the description of a graph that contains a vertex cover of k vertices, where a vertex cover is a set of vertices that between them include at least one endpoint of every edge }.
PARTITION = { S : S is a set of integers that has a subset S' with ∑S' = ½∑S }.

All of these can be solved nondeterministically by guessing the appropriate subset or input and then verifying it in polynomial time. Except for the first problem, no polynomial time algorithm for solving any of these problem deterministically is known.

2. NP-complete problems

A decision problem A is NP-hard if there is a polynomial-time reduction from any problem in NP to A. A problem is NP-complete if it is both contained in NP and is NP-hard. Every problem on the list above except for COMPOSITE is NP-complete:

CIRCUIT SATISFIABILITY: This is the grandmother of all NP-complete problems, based on a construction due to Cook and independently to Levin, which is usually called either Cooks' Theorem or the Cook-Levin theorem. The basic idea is that if you have any language L that is in NP, there is a machine M that answers 1 on M(x,y) it time p(|x|) for your input x and some input y if an only if x is in L. To reduce L to CIRCUIT SATISFIABILITY, transform the entire computation of M into a gigantic (but still polynomial-size) ciruit that is essentially p(|x|) copies of M, one copy per time unit, with gates between the copies to enforce that the circuit actually describes a computation of M. Hard-wire x into the circuit's inputs, but let y float freely; then if there is any assignment that makes the circuit output 1, there is a y that set M(x,y) = 1 and x is in L. Conversly, if the circuit never outputs 1, then M(x,y) = 0 for all y and x is not in L.
SATISFIABILITY: Reduce from CIRCUIT SATISFIABILITY by replacing each gate in the circuit with an AND of ORs describing the relation between the output of the gate and its inputs, and then take and AND of all of these ANDs of ORs.
3SAT: Reduce from SATISFIABILITY using the transformation (x₁ ∨ x₂ ∨ x₃ ... ∨ x_n) = (z ∨ x₁ ∨ x₂) ∧ (~z ∨ x₃ ... ∨ x_n) to knock all large clauses down to 3 or fewer literals. Pad smaller clauses out to 3 literals by inserting dummy variables.
CLIQUE: Reduce from 3SAT. Construct a graph with a node for each appearence of a literal (a variable or its negation) in each clause; for example, the formula (x ∨ y ∨ z) ∧ (~x ∨ ~y ∨ z) would produce 6 nodes. Put an edge between two nodes provided (a) they represent literals in different clauses and (b) both literals can be true at the same time (this excludes edges between x and ~x, for example, but allows edges between x and y or x and ~z). Now ask for a clique of size k where k equals the number of clauses. Condition (a) means that this clique can contain at most one literal per clause (since there are no edges within clauses); condition (b) says that every node in the clique represents a literal that is consistent with every other node's literal. So if the clique exists, we can set all the corresponding literals to be true and satisfy every clause and thus the formula. In the other direction, given a satisfying assignment, we can pick one true literal per clause (there has to be at least one) and get a k-clique.
INDEPENDENT SET: Reduce from clique by constructing ~G, which has an edge between u and v if and only if G doesn't.
VERTEX COVER: Reduce from INDEPENDENT SET by looking for a set of n-k points that are not in the independendent set.
PARTITION: Reduction from VERTEX COVER via SUBSET SUM. See SubsetSumReduction.

Some other well-known NP-complete problems:

GRAPH 3-COLORABILITY: { G : G can be colored with 3 colors so that no edge has the same color on both endpoints }. The reduction is from 3SAT by constructing little graphs that act like OR and NOT gates for colors.
EXACT COVER BY 3-SETS (X3C): This is essentially the tripartite marriage problem from CS365/Assignments/HW10; you are given a set of triples { abc, aqx, bda, acd, ... } and asked if you can find a subset that includes each letter exactly once. The reduction is from 3SAT using a construction similar to that for CLIQUE.
HAMILTONIAN PATH: { G : G has a path that visits every vertex exactly once }. The reduction is from 3SAT by building widgets that simulate a circuit based on which edges the path crosses.
TRAVELING SALESMAN: { (G,w,K) : G has a cycle that visits every vertex at least once, such that the total weight of the edges in the cycle is at most K }. The reduction is from HAMILTONIAN CIRCUIT, the cycle version of HAMILTONIAN PATH (which in turn can be reduced from HAMILTONIAN PATH).

CategoryAlgorithmNotes