1. Multiplying large integers
Suppose we wish to multiply two very large integers (e.g. thousands of bits each). When dealing with such monsters it no longer makes sense to assume that multiplication is constant time. Instead, we treat each integer as an array of bits xnxn-1...x1x0 and multiply using the algorithm:
x * y = (∑i xi2i)(∑jyj2j) = ∑i∑j xiyj2i+j.
Assuming that the addition is cheap, this takes Theta(n2) time. In practice we can usually speed up this algorithm by a large constant factor by using larger digits, e.g. by multiplying the numbers 16 bits at a time instead of one bit at a time. But can we do better?
2. A divide-and-conquer algorithm
This algorithm is due to Karatsuba and Ofman, in a 1962 paper from Doklady Akademii Nauk SSSR. The paper is written in Russian, so I haven't read it. The description below is adapted from a description in Section 5.2.2 of Goodrich and Tamassia's Algorithm Design (Wiley, 2002).
Imagine splitting x and y into two very huge digits of k bits each: x = x12k + x0, y = y12k + y0. Then
xy = (x12k + x0)(y12k + y0) = x1y122k+x1y02k+x0y12k+x0y0.
We will assuming that shifting (i.e., multiplying by 2something) is free and addition takes linear time. We have thus reduced the problem of computing a product on n=2k bits to the problem of computing four products on k bits, which gives running time T(n) = 4T(n/2) + Theta(n) = T(n2) by the MasterTheorem. So far we haven't made any progress.
The trick to the Karatsuba-Ofman algorithm is to compute the x1y1 and x0y0 terms with one recursive multiplication each, and then compute sum of the remaining terms x1y0+x0y1 with a single additional multiplication. This can be done by computing
z = (x1-x0)(y0-y1) = x1y0 - x1y1 - x0y0 + x0y0
x1y0+x0y1 = z + x1y1 + x0y0.
So to compute xy, compute the three products above, and then let
xy = x1y122k + (z + x1y1 + x0y0)2k + x0y0.
Since we can compute the three products recursively using the same algorithm, we have T(n) = 3T(n/2) + Theta(n) = Theta(nlg 3) by the MasterTheorem. This is O(n1.59) or thereabouts, much better than the straightforward algorithm.
A similar trick can be used to do fast matrix multiplication, by reducing multiplication of two n by n matrices to seven products of n/2 by n/2 matrices. The resulting algorithm, known as Strassen's Algorithm, has a recurrence T(n) = 7T(n/2) + O(n2) with solution T(n) = Theta(nlg 7). The value of lg 7 is about 2.8, so this is somewhat better than the Theta(n3) running time for the straightforward algorithm. (See StrassensAlgorithm.)
3. FFT approach
A still faster method is to apply a FastFourierTransform to the two numbers, which changes convolution (zi = Sigmaj xi+jyi-j; this is another way to compute the product) into pointwise multiplication (zi=xiyi). We then carry out the pointwise multiplication and invert the FFT. The cost of this approach in bit operations is O(n log n) if implemented very carefully; see KnuthSeries volume 2, Section 4.3.3.C.