Lagrange multipliers are a technique for finding a maximum or minimum of some function f(x) (where x may be a vector) subject to a constraint of the form g(x) = 0. A typical application might be optimally allocating resources to maximize some function f(x,y,z) with a resource constraint g(x,y,z) = x+y+z - c = 0.

The technique generalizes the standard high-school calculus technique of finding extrema by looking for values of x where the first derivative of f is zero; the difference is that now instead of having just a single-dimensional x, we have a multi-dimensional x that may encode many different variables. But the basic intuition is still the same: in order to be at a maximum or minimum, we have to be at a point where a small step will not change the value of f.

How do we express this in high dimension? It may help to think of the g(x)=0 surface as a hill, where gravity is always perpendicular to the local landscape. At each point we can compute (by taking partial derivatives with respect to each coordinate in x) the direction in which f(x) increases the most. If this direction is not straight up or straight down, we can increase f(x) by walking in some direction (and decrease it by walking in the opposite direction). So we are looking for a point where the direction of increase of f is exactly aligned with the perpendicular to the g(x)=0 surface, which happens to point precisely in the direction of increase of g.

To find this point, we compute partial derivatives. If you're not familiar with partial derivatives, the way to think about them is being essentially the same as regular derivatives where one assumes that every variable except the one we are differentiating against is treated as a constant. So the partial derivative of xy^{2} with respect to x, written ∂xy^{2}/∂x, is just y^{2}, and the partial derivative of y^{2} with respect to y, written ∂xy^{2}/∂y, is 2xy. The symbol in the expression is a variant form of the Greek lowercase delta (which you may not be able to see if your browser doesn't like Unicode).

Given a real-valued function f(x,y) (or more generally f(x_{1},x_{2},...,x_{n})), its *gradient* (written ∇f) or direction of increase is the vector-valued function ∇f = (∂f(x,y)/∂x,∂f(x,y)/∂y) (in general (∂f(x)/∂x_{1},∂f(x)/∂x_{2},...f(x)/∂x_{n})). For example, the function xy^{2} above has gradient (y^{2},2xy). This represents an arrow that points in some direction for each value of x and y; it points in the same direction as some other arrow if each coordinate is a nonzero constant multiple of the corresponding coordinate of the other arrow. In the context of function maximization, this nonzero constant multiple (which is usually written as a lowercase lambda but which will be written here as L because I am tired of looking up numerical codes for Greek letters) is called the Lagrange multiplier, and to find a place where ∇f and ∇g point in the same direction, we are looking for a place where ∂f(x,y)/∂x = L ∂g(x,y)/∂x and ∂f(x,y)/∂y = L ∂g(x,y)/∂y for the same value of L.

Let's take a simple example. Suppose that f = xy^{2} as above and that g(x,y) = x+y-1. Here we want to find (x,y) with x+y=1 that maximizes f. Since we have only two variables, we could rewrite f(x,y) as f(x,1-x), but for larger problems or more complicated g this may not work. So let's try Lagrange multipliers. We have already computed the gradient of f. It is easy to see that the partial derivative of g with respect to either argument is 1, so we want to find x,y, and L such that:

y

^{2}= L

and

- 2xy = L.

Here we can simply equate

y

^{2}= 2xy

and observe that this has solutions y = 2x and y=0. The y=0 solution gives L=0, which doesn't work (since you can always make the gradients equal by setting L=0), so that leaves us with y=2x. Since we also know that x+y=1, we have

- x = 1/3, y = 2/3, f(x,y) = 4/27.

And at this point we can reasonably argue that since (1/3, 2/3) is the only extremum of this function, and since f(x,y) takes on smaller values elsewhere (e.g. f(1,0) = 0), then (1/3,2/3) must in fact be the global maximum.

# 1. Some more complicated examples

## 1.1. A product

Suppose we want to maximize P = ∏ x_{i}, where i=1..n, subject to S = ∑ x_{i} = c, for some constant c and x_{i}≥0 for all i.

Compute ∂P/∂x_{i} = ∏_{i≠j x}j_{ = P/x}i_{ (provided x}i_{≠0) and ∂S/∂x}i_{ = 1. So we have P/x}i_{ = λ for all i. Even without knowing what λ is, we can equate P/x}i_{ = P/x}j_{ which implies x}i_{ = x}j_{ for all i, j. But then S = ∑ x}i_{ = nx}i_{ = c gives x}i_{ = n/c for all i. This gives P = (n/c)^n^. We can observe that this is likely to be a maximum because for other values of x}i_{ (e.g. when some x}i,, = 0) we get smaller values of P.

## 1.2. A scheduling problem

Suppose that we have n jobs that we want to schedule on m parallel machines (say, cores in a multicore CPU). The machines have different speeds, so if we assign n_{i} jobs to machine i, it will take n_{i}/s_{i} time units to finish all of them. Our goal is to minimize the sum over all jobs of the time until it is finished. To make things easier, we'll assume that each machine refuses to return its jobs until they are all done, so that the total cost for all jobs assigned to machine i is n_{i}^{2}/s_{i}.

In Lagrange multiplier terms, we want to minimize f = ∑ n_{i}^2/s_{i} subject to g = ∑ n_{i} = n. We can easily compute ∂f/∂n_{i} = 2n_{i}/s_{i} and ∂g/∂n_{i} = 1. So we want 2n_{i}/s_{i} = λ for all i, or n_{i} = λs_{i}/2. Here it may help to solve for λ: since ∑ n_{i} = ∑ λs_{i}/2 = n, we get λ = 2n/(∑s_{i}) and each n_{i} = 2s_{i}n/2∑s_{i} = n(s_{i}/∑s_{i}). So each machine is assigned jobs in proportion to its speed.