Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

ComputingSums

Mostly today we want to be able to compute the asymptotic running time of iterative algorithms.

1. Examples

DuplicateExists(A):
 for i = 1 to n:
  for j = 1 to n:
   if i != j and A[i] = A[j] then:
    return true
 return false

Since we want the worst-case performance, we'll assume that the algorithm never bails out early (i.e. that the input has no duplicates). We can then compute the asymptotic performance by multiplying:

If test: O(1).
Inner j loop: n * O(1) = O(n).
Outer j loop: n * O(n) = O(n²).

So the total cost is O(n²) + O(1) = O(n²).

Next let's consider an optimized version:

DuplicateExists(A):
 for i = 2 to n:
  for j = 1 to i-1:
   if A[i] = A[j] then:
    return true
 return false

Now the running time of the inner loop is O(i). We can still argue that the total running time is O(n²) since O(i) is O(n). But perhaps we can get a better analysis by adding up the costs directly.

We can easily show that the i-th iteration of the outer loop costs Theta(i). So the total cost is on the order of

Sigma_{i=2 to n} i = (Sigma_{i=1 to n} i) - 1 = n(n+1)/2 - 1 = Theta(n²).

Here's a further optimization using a set data structure implemented as a tree. We'll assume that Insert and Find both take O(log m) time where m is the number of elements in the data structure.

DuplicateExists(A):
 S = new Set
 for i = 1 to n:
   if A[i] is in S:
    return true
  else:
    insert A[i] into S
  return false

Now we are looking at bounds on the order of

Sigma_{i=1 to n} log i.

We can easily get an upper bound on this sum by observing that log is increasing, so i <= n implies log i <= log n and

Sigma_{i=1 to n log i <= Sigma}i=1 to n log n = n log n.

This tells us that the new algorithm is O(n log n), which is clearly better than the Theta(n²) running times of our previous attempts. Can we get a matching lower bound to make this bound tight?

There are many ways to get this lower bound. The quickest may be to split out the larger terms. If i >= n/2, then log i >= log n/2 = (log n) - 1 = Omega(log n); so

Sigma_{i=1 to n} log i >= Sigma_{i=n/2 to n} log i >= Sigma_{i=n/2 to n} log n/2 >= (n/2) log (n/2) = Omega(n log n).

Alternatively, we could use the integral trick (if we remember how to integrate log n). We have to be a little bit careful to avoid log 0, which is undefined:

Sigma_{i=1 to n} log i = Sigma_{i=2 to n} log i >= c Integral_{s=1 to n} ln s ds = c n ln n - cn = Omega(n log n).

So again we get a tight bound.