Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Multiplying large integers

Suppose we wish to multiply two very large integers (e.g. thousands of bits each). When dealing with such monsters it no longer makes sense to assume that multiplication is constant time. Instead, we treat each integer as an array of bits x_nx_n-1...x₁x₀ and multiply using the algorithm:

x * y = (∑_i x_i2ⁱ)(∑_jy_j2^j) = ∑_i∑_j x_iy_j2^i+j.

Assuming that the addition is cheap, this takes Theta(n²) time. In practice we can usually speed up this algorithm by a large constant factor by using larger digits, e.g. by multiplying the numbers 16 bits at a time instead of one bit at a time. But can we do better?

2. A divide-and-conquer algorithm

This algorithm is due to Karatsuba and Ofman, in a 1962 paper from Doklady Akademii Nauk SSSR. The paper is written in Russian, so I haven't read it. The description below is adapted from a description in Section 5.2.2 of Goodrich and Tamassia's Algorithm Design (Wiley, 2002).

Imagine splitting x and y into two very huge digits of k bits each: x = x₁2^k + x₀, y = y₁2^k + y₀. Then

xy = (x₁2^k + x₀)(y₁2^k + y₀) = x₁y₁2^2k+x₁y₀2^k+x₀y₁2^k+x₀y₀.

We will assuming that shifting (i.e., multiplying by 2^something) is free and addition takes linear time. We have thus reduced the problem of computing a product on n=2k bits to the problem of computing four products on k bits, which gives running time T(n) = 4T(n/2) + Theta(n) = T(n²) by the MasterTheorem. So far we haven't made any progress.

The trick to the Karatsuba-Ofman algorithm is to compute the x₁y₁ and x₀y₀ terms with one recursive multiplication each, and then compute sum of the remaining terms x₁y₀+x₀y₁ with a single additional multiplication. This can be done by computing

z = (x₁-x_0)(y0_-y1) = x₁y₀ - x₁y₁ - x₀y₀ + x₀y₀

so that

x₁y₀+x₀y₁ = z + x₁y₁ + x₀y₀.

So to compute xy, compute the three products above, and then let

xy = x₁y₁2^2k + (z + x₁y₁ + x₀y₀)2^k + x₀y₀.

Since we can compute the three products recursively using the same algorithm, we have T(n) = 3T(n/2) + Theta(n) = Theta(n^{lg 3}) by the MasterTheorem. This is O(n^1.59) or thereabouts, much better than the straightforward algorithm.

A similar trick can be used to do fast matrix multiplication, by reducing multiplication of two n by n matrices to seven products of n/2 by n/2 matrices. The resulting algorithm, known as Strassen's Algorithm, has a recurrence T(n) = 7T(n/2) + O(n²) with solution T(n) = Theta(n^{lg 7}). The value of lg 7 is about 2.8, so this is somewhat better than the Theta(n³) running time for the straightforward algorithm. (See StrassensAlgorithm.)

3. FFT approach

A still faster method is to apply a FastFourierTransform to the two numbers, which changes convolution (z_i = Sigma_j x_i+jy_i-j; this is another way to compute the product) into pointwise multiplication (z_i=x_iy_i). We then carry out the pointwise multiplication and invert the FFT. The cost of this approach in bit operations is O(n log n) if implemented very carefully; see KnuthSeries volume 2, Section 4.3.3.C.

CategoryAlgorithmNotes