**Linear algebra** is the branch of algebra that studies vector spaces (see AlgebraicStructures for the definition of a vector space). The key concepts in linear algebra are independence, dimension, subspaces, and linear transformations (which for finite-dimensional spaces can be represented by matrices). There are two ways to look at linear algebra: as a set of rules for manipulating coordinates, vectors, vectors and matrices (*concrete linear algebra*—see LinearAlgebra) or as the properties of vector spaces considered in the abstract (*abstract linear algebra*). Both approaches get to essentially the same place, just in a different order. Here we start by assuming we have a vector space and derive coordinates, vectors, and matrices from it.

# Linear combinations and independence

Let **x**_{1},...**x**_{n} be vectors in some vector space V over a field F. A **linear combination** of the **x**_{i} is a sum a_{1}**x**_{1} + a_{2}**x**_{2} + ... + a_{n}**x**_{n}, where the a_{i} are coefficients from F. The **span** of the **x**_{i} is the set of all vectors that can be constructed by taking linear combinations of the **x**_{i}. It is not hard to see that this is the smallest subspace of V that contains the **x**_{i}, and we can think of it as the subspace generated by the **x**_{i} just as a subgroup may be generated by some set of elements of a group. A vector **y** is **independent** of the vectors **x**_{i} if it is not in their span.

A set of vectors **x**_{i} is **independent** if each vector is independent of all the others; an equivalent, and more symmetric, definition is that a_{1}**x**_{1} + a_{2}**x**_{2} + ... a_{n}**x**_{n} does not equal **0** unless all the a_{i} are zero. We'll prove that these definitions are equivalent by showing that if any sequence **x**_{i} has a nonzero linear combination that sums to **0**, then there is some **x**_{i} that can be expressed as a linear combination of its predecessors. (The other direction is easier, since if some **x**_{i} can be expressed as a linear combination of the others, then subtracting **x**_{i} from that linear combination gives **0**.)

- Theorem 1
Let ∑a

_{i}**x**_{i}=**0**, where not all a_{i}are 0. Then for some k there are coefficients b_{1}...b_{k-1}such that**x**_{k}= ∑_{i<k}b_{i}**x**_{i}. Proof: Let k be the largest index for which a_{k}≠0. Then solving ∑a_{i}**x**_{i}=**0**for**x**_{k}gives**x**_{k}= ∑_{i!=k}(a_{i}/a_{k})**x**_{i}) = ∑_{i<k}(a_{i}/a_{k})**x**_{i}).

Technical note: If the set of vectors x_{i} is infinite, then only linear combinations with a finite number of nonzero coefficients are permitted. We will generally not consider vector spaces big enough for this to be an issue.

# Bases

Linear independence is useful for establishing a coordinate system for a vector space. A **basis** of a vector space V is an independent set of vectors {**x**_{i}} such that every vector **y** in V can be expressed as a linear combination **y** = a_{1}**x**_{1} + a_{2}**x**_{2} + ... + a_{n}**x**_{n}. Given a particular basis, we can write any vector as the sequence (a_{1}, a_{2}, ... a_{n}). In fact, we can do so in only one way:

- Theorem 2
If {

**x**_{i}} is a basis for V, then every vector**y**has a unique representation**y**= a_{1}**x**_{1}+ a_{2}**x**_{2}+ ... + a_{n}**x**_{n}.- Proof
Suppose there is some

**y**with more than one representation, i.e. there are sequences of coefficients a_{i}and b_{i}such that**y**= a_{1}**x**_{1}+ a_{2}**x**_{2}+ ... + a_{n}**x**_{n}= b_{1}**x**_{1}+ b_{2}**x**_{2}+ ... + b_{n}**x**_{n}. Then**0**=**y**-**y**= a_{1}**x**_{1}+ a_{2}**x**_{2}+ ... + a_{n}**x**_{n}- b_{1}**x**_{1}+ b_{2}**x**_{2}+ ... + b_{n}**x**_{n}= (a_{1}-b_{1})**x**_{1}+ (a_{2}-b_{2})**x**_{2}+ ... + (a_{n}-b_{n})**x**_{n}. But since the**x**_{i}are independent, the only way a linear combination of the**x**_{i}can equal**0**is if all coefficients are 0, i.e. if a_{i}= b_{i}for all i.

Even better, we can do all of our usual vector space arithmetic in terms of the coefficients a_{i}. For example, if **a** = ∑a_{i}**x**_{i} and **b** = ∑b_{i}**x**_{i}, then it can easily be verified that **a**+**b** = ∑(a_{i}+b_{i})**x**_{i} and c*a* = ∑(ca_{i})**x**_{i}.

However, it may be the case that the same vector will have different representations in *different* bases. For example, in the vector space ℝ^{2} consisting of pairs of real numbers, we could have a basis B_{1} = { (1,1), (0,1) } and a basis B_{2} = { (1,0), (1,-2) }. The vector (2,3) would be represented as (2,3) using basis B_{1} but would be represented as (5/2,-3/2) in basis B_{2}. Usually we will pick a "standard basis" that will look something like B_{1}. Generally it won't matter too much which basis we pick. Part of the reason for this is that for vector spaces with finite bases, all the bases look pretty much the same:

- Theorem 3
Let

**x**_{1}...**x**_{n}and**y**_{1}...**y**_{m}be two finite bases of the same vector space V. Then n=m.- Proof
Assume without loss of generality that n ≤ m. We will show how to replace elements of the

**x**_{i}basis with elements of the**y**_{i}basis to produce a new basis consisting only of**y**_{1}...**y**_{n}. Start by considering the sequence**y**_{1},**x**_{1}...**x**_{n}. This sequence is not independent since**y**_{1}can be expressed as a linear combination of the**x**_{i}(they're a basis). So from Theorem 1 there is some**x**_{i}that can be expressed as a linear combination of**y**_{1},**x**_{1}...**x**_{i-1}. Swap this**x**_{i}out to get a new sequence**y**_{1},**x**_{1}...**x**_{i-1},**x**_{i+1},...**x**_{n}. This new sequence is also a basis, because (a) any**z**can be expressed as a linear combination of these vectors by substituting the expansion of**x**_{i}into the expansion of**z**in the original basis, and (b) it's independent, because if there is some nonzero linear combination that produces**0**we can substitute the expansion of**x**_{i}to get a nonzero linear combination of the original basis that produces**0**as well. Now continue by constructing the sequence**y**_{2},**y**_{1},**x**_{1}...**x**_{i-1},**x**_{i+1},...**x**_{n}, and arguing that some**x**_{i'}in this sequence must be expressible as a combination of earlier terms by Theorem 1 (it can't be y_{1}because then y_{2}y_{1}is not independent), and drop this**x**_{i'}. By repeating this process we can eventually eliminate all the**x**_{i}, leaving the basis**y**_{n},...,**y**_{1}. But then any**y**_{k}for k > n would be a linear combination of this basis, so we must have m = n.

The size of any basis of a vector space is called the **dimension** of a vector space.

# Linear transformations

Recall that **linear transformation** from a vector space U to a vector space V, both defined over the same field F, is a function A:U->V such that

A(a

**x**) = aA(**x**) for all a in in F and**x**in U.A(

**x**+**y**) = A(**x**)+A(**y**) for all**x**and**y**in U.

Often these two equations are combined into the single equation

A(a

**x**+b**y**) = aA(**x**)+bA(**y**).

It is easy to show that A satisfies the first two equations if and only if it satisfies the combined one. Note that to avoid writing parentheses all the time, we often write the application of a linear transformation like multiplication: A**x** = A(**x**).

Some examples of linear transformations:

The identity transformation I:V->V defined by I

**x**=**x**is a linear transformation: I(a**x**+b**y**) = a**x**+ b**y**= aI**x**+ bI**y**.Let U = F

^{n}, and let A(x_{1},x_{2}..._{x}n_{) = x}1,,. Then A is a linear transformation from U to F^{1}, called the**projection**of**x**onto its first coordinate.`For each scalar a, the map

**x**-> a**x**is a linear transformation.- Let U and V both be the vector space of real-valued random variables on some probability space. Then the expectation operator E is a linear transformation from U to U: E[aX+bY] = aE[X] + bE[Y].
Let U be the space of differentiable functions from ℝ to ℝ and let V be the space of integrable functions from ℝ to ℝ. Then the differentiation operator D:U->V and the integration operator S:V->U are both linear transformations.

Most of the linear transformations you will encounter are between finite-dimensional vector spaces. These have a particularly simple description in terms of matrices that we will discuss shortly. First, however, let's look at some properties of linear transformations in general.

## Linear transformations form a vector space

Given linear transformations A and B and and a scalar c, define (A+B)**x** = A**x** + B**x** and (cA)**x** = c(A**x**). Now we have:

- The set of linear transformations from U to V, with addition, is an abelian group. Proof:
- Given A and B, we can show (A+B) is also a linear transform by observing that (A+B)(ax+by) = A(ax+by) + B(ax+by) = aAx + bAy + aBx + aBy = a(Ax+Bx) + b(Ay+By) = a(A+B)x + a(A+B)y.
- Given A, let -A be defined by (-A)x = -(Ax). Then (a) -A is linear, because (-A)(ax+by) = -(A(ax+by)) = -(aAx+bAy) = a(-(Ax)) + b(-(Ay)) [note use of field axioms here] = a((-A)x) + b((-A)y); and (b) A + -A = 0 since Ax + (-A)x = Ax + -Ax = 0.
- The 0 map defined by 0x = 0 is an identity since Ax + 0x = Ax + 0 = Ax for all x.

- The set of linear transformations is closed under scalar multiplication: If A is linear then so is cA, since (cA)(ax+by) = c(A(ax+by) = c(aAx+bAy) = (ca)Ax + (cb)Ay = (ac)Ax + (ac)Ay = a(cA)x+a(cA)y.
- The four remaining vector space axioms all hold:
- (a(bA))x = a(b(Ax)) = (ab)(Ax) = ((ab)A)x).
- (1A)x = 1(Ax) = Ax.
- (c(A+B))x = c((A+B)x) = c(Ax + Bx) = c(Ax) + c(Bx) = (cA)x + (cB)x.
- ((c+d)A)x = (c+d)(Ax) = c(Ax) + d(Ax) = (cA)x + (dA)x.

So another source of vector spaces is transformations between existing vector spaces.

Exercise: go through each of these proofs and figure out just which property (e.g. commutativity of scalar multiplication, definition of cA, linearity of A) is used at each step.

## Linear transformations form a ring

More interesting is the case of linear transformations from a vector space V to itself. For these we can not only define addition of transformations as above, but we can also define multiplication as composition: (AB)**x** = A(B(**x**)). This multiplication operation is associative and has an identity (I), and it distributes over addition of linear transformations: A(B+C)(**x**) = A(B**x**+C**x**) = AB**x** + AC**x** (the last step uses the fact that A is linear). So the linear transformations from a vector space to itself form a ring. In general, this ring will *not* be commutative unless the vector space is very small.

## Inverses of linear transformations

As in any ring, some linear transformations on a single vector space V will have multiplicative inverses: a transformation A^{-1} such that AA^{-1} = A^{-1}A = I. Since multiplication is really composition, a necessary and sufficient condition for A to have an inverse is that it be a bijection.

Necessity is easy here: for sufficiency we have to prove that the inverse of A as a function is also linear; i.e. that if A(cx+dy) = cAx + dAy for all scalars c and d and vectors x and y, then the same holds for A^{-1}. Consider some c, d, r, and s where c and d are scalars and r and s are vectors in V. Let z = A^{-1}(cr+ds) - (cA^{-1}r + dA^{-1}s). Then Az = A(A^{-1}(cr+ds) - (cA^{-1}r + dA^{-1}s)) = (cr+ds) - A(cA^{-1}r) - A(dA^{-1}s) = cr + ds - cAA^{-1}r - dAA^{-1}s = cr + ds - cr - ds = 0. So Az = 0. We already know that A0 = 0, so if A is indeed a bijection we must have z = 0 implying A^{-1}(cr+ds) = cA^{-1}r + dA^{-1}s. Since this holds for all c, d, r, and s, we have A^{-1} is a linear transformation.

From the proof we can see that an equivalent criterion for A to be invertible is that Az = 0 if and only if z = 0. (This is also what we'd expect based on the fact that A is a group homomorphism if we forget about scalars).

# Matrices

Suppose that U and V are both finite-dimensional, and we are provided with a basis **x**_{1}...**x**_{n} for U and **y**_{1}...**y**_{m} for V. In this case we can express any linear transformation from U to V using nm coefficients, which can conveniently be organized as a matrix (see Relations for the definition of a matrix).

Here's the idea: Let A:U->V be linear, and suppose we are given A**x**_{j} for each basis vector **x**_{j}. Then given *any* vector z in U, we can compute A**z** by applying A to the unique expansion **z** = ∑z_{i}**x**_{j} like this:

A

**z**= A(∑z_{j}**x**_{j}) = ∑z_{j}(A**x**_{j}).

But now we can further expand the vectors A**x**_{j} in V using V's basis vectors. For each j, let A**x**_{j} = ∑a_{ij}**y**_{i}. Then

A

**z**= ∑_{j}z_{j}(A**x**_{j}) = ∑_{j}z_{j}(∑_{i}a_{ij}**y**_{i}) = ∑_{i}(∑_{j}a_{ij}z_{j})**y**_{i}.

In other words, the i-th coordinate of A**z** is a sum of the coefficients a_{ij} times z_{j} over all values of j. Each coefficient a_{ij} is in effect specifying how much of **y**_{i} is contributed by each copy of **x**_{j}.

Since the a_{ij} coefficients have two indices, we usually think of them as organized into an m-by-n matrix:

where as usual the first index of each coefficient specifies the row and the second specifies the column. (The rows-before-columns convention is also why we call this an m-by-n matrix instead of an n-by-m matrix when m is the number of rows and n is the number of columns.)

Where it will not cause confusion, it is customary to use the same letter (and capitalization) for both a linear transformation and the matrix that represents it. So we would write (for example):

Note that a particular matrix representation is only meaningful if we have already fixed a basis for both vector spaces; if for some reason the same transformation A is represented using two different sets of bases, it won't make sense to call the two different matrices that will arise in each by the same letter.

Vectors of dimension m are conventionally represented as m-by-1 matrices, i.e. as a single column:

so that applying A to z looks like this:

where the ugly-looking 2-by-1 matrix on the far right consists of two rows with only one entry in each row.

## Matrix multiplication

Each operation on linear transformations (addition, scalar multiplication, multiplication) can be expressed in terms of the corresponding matrices.

For addition, we have (A+B)_{ij} = A_{ij}+B_{ij} for each pair of indices i,j. The proof is that the **y**_{i} coefficient of A**z**+B**z** is given by ∑_{j}A_{ij}z_{j} + ∑_{j}B_{ij}z_{j} = ∑_{j}(A_{ij}+B_{ij})z_{j}.

For scalar multiplication, we have (cA)_{ij} = cA_{ij}. Here the **y**_{i} coefficient of (cA)**z** is c∑_{j}A_{ij}z_{j} = ∑_{j}(cA_{ij})z_{j}.

**Matrix multiplication**, which corresponds to linear transformation multiplication (i.e. composition), is more complicated. Here we assume that **y**_{i} = **x**_{i} for all i, or in other words that we have a single basis for our single vector space. Given a vector **z**, we let z_{i} be the coefficient on **x**_{i} in the unique expansion **z** = ∑ z_{i} **x**_{i}. Now given transformations A and B, we wish to express the composite transformation AB as a matrix, by computing for each i,j the coefficient (AB)_{ij} that expresses how much z_{j} contributes to (AB**z**)_{i}.

We can compute these coefficients as follows. Start with the fact that

Then

From this we can immediately read off that

In visual terms, to get the element in the i-th row and j-th column of AB, we take the i-th row of A and the j-th column of B, line them up, multiply matching elements, and sum the result. This operation of taking the sum of pairwise products of vector elements is known as a **dot product** and is of central importance in linear algebra.

## Vectors as matrices

We previously defined for a linear transformation A from an n-dimensional to an m-dimensional space and a vector **z** = ∑_{j}**x**_{j} the product A**z** as ∑_{i} (∑_{j} A_{ij}z_{j}) **y**_{i}, where the **x**_{j} and **y**_{i} are the basis vectors of the domain and codomain of A. If we drop the basis vectors and represent **z** as a matrix in the form of an n×1 **column vector** z with z_{1j} = **z**_{j}, then Az is an m×1 column vector with coordinates (Az)_{1i} = ∑_{j} A_{ij}z_{j}—precisely the coordinates of A**z**. So by representing vectors as matrices in column-vector form we can represent the application of a linear transformation as just another special case of matrix multiplication.

## The transpose of a matrix

Given an n×m matrix A, define its **transpose** A^{T} (sometimes written as A') as the m×n matrix with coefficients (A^{T})_{ij} = A_{ji}. In other words, in A^{T} we reverse the indices of A, effectively flipping the matrix diagonally.

This is an easy operation to define, but what does it mean? Given a n×1 column vector x, x^{T} is an 1×n **row vector**. Since it has n columns we can multiply it using ordinary matrix multiplication by an n×1 column vector y, say, giving (x^{T}y)_{11} = ∑_{i} x_{1i}y_{i1}. This is the **dot product** operation mentioned above, and is often abbreviated as x⋅y. Its interpretation is that taking the transpose of x converts it from a vector to a scalar-valued linear function on vectors. The set of all such scalar-valued linear functions on vectors is itself a vector space, called the **dual space** V^{*} of the original ("primal") vector space V. For finite-dimensional spaces the dual space is isomorphic to its primal space, since (x^{T})^{T} = x for all x.

What about bigger matrices? If we take some n×m matrix A representing a linear transformation A:U→V and compute its m×n transpose A^{T}, we get a linear transformation from an m-dimensional space to an n-dimensional space. To understand what this matrix does, it is helpful to first look at the transpose of a product: ((AB)^{T})_{ik} = (AB)_{ki} = ∑_{j} A_{kj}B_{ji} = ∑_{j} (B^{T})_{ij} (A^{T})_{jk} = (B^{T}A^{T})_{ik}. Or stated more concisely: (AB)^{T} = B^{T}A^{T}. So now what does A^{T} do to some vector y? We have A^{T}y = (y^{T}A)^{T}. If we think of y^{T} as an element of a dual space, y^{T}A is also an element of the same dual space (since whenever we can multiply y^{T}x we can also multiply (y^{T}A)x = y^{T}(Ax)). So A^{T} acts like a linear transformation from V→U defined by a taking a detour through the dual space via A^{T}y = (y^{T}A)^{T}.

## The inverse of a matrix

Some matrices are invertible; some (including all the ones that aren't square) are not. To try to invert a matrix, we start with the pair of matrices A, I (where I is the **identity matrix** defined by I_{ii} = 1 and I_{ij} = 0 when i≠j), and multiply both sides of the pair from the left by a sequence of transformation matrices B_{1}, B_{2}, ... B_{k} until B_{k}B_{k-1}⋯B_{1}A = I. At this point the right-hand matrix will be B_{k}B_{k-1}⋯B_{1} = A^{-1}. (We could just keep track of all the B_{i}, but it's easier to keep track of their product.)

How do we pick the B_{i}? These will be matrices that (a) multiply some row by a scalar, (b) add a multiple of one row to another row, or (c) swap two rows. We'll use the first kind to make all the diagonal entries equal one, and the second kind to get zeroes in all the off-diagonal entries. The third kind will be saved for emergencies, like getting a zero on the diagonal.

That the operations (a), (b), and (c) correspond to multiplying by a matrix is provable but tedious.^{1} Given these operations, we can turn any invertible matrix A into I by working from the top down, rescaling each row i using a type (a) operation to make A_{ii} = 1, then using a type (b) operation to subtract A_{ji} times row i from each row j > i to zero out A_{ji}, then finally repeating the same process starting at the bottom to zero out all the entries above the diagonal. The only way this can fail is if we hit some A_{ii} = 0, which we can swap with a nonzero A_{ji} if one exists (using a type (c) operation). If all the rows from i on down have a zero in the i column, then the original matrix A is not invertible. This entire process is known as Gauss-Jordan_elimination.

This procedure can be used to solve systems of linear equations: if A**x** = **b**, we can compute **x** by first computing A^{-1} and then multiplying **x** = A^{-1}A**x** = A^{-1}**b**. If we are not interested in A^{-1} for its own sake, we can simplify things by substituting **b** for I during the Gauss-Jordan elimination procedure; at the end, it will be transformed to **x**.

### Example

Original A is on the left, I on the right. We'll work in ℤ_{5} just to make things more entertaining, and to avoid having to write down a lot of fractions. Recall that in ℤ_{5}, 2^{-1} = 3, 1^{-1} = 1, and 4^{-1} = 4.

Initial matrices: 2 0 1 1 0 0 1 0 1 0 1 0 3 1 2 0 0 1 Multiply top row by 3: 1 0 3 3 0 0 1 0 1 0 1 0 3 1 2 0 0 1 Subract top row from middle row and 3*top row from bottom row: 1 0 3 3 0 0 0 0 3 2 1 0 0 1 3 1 0 1 Swap middle and bottom rows: 1 0 3 3 0 0 0 1 3 1 0 1 0 0 3 2 1 0 Multiply bottom row by 2: 1 0 3 3 0 0 0 1 3 1 0 1 0 0 1 4 2 0 Subtract 3*bottom row from top and middle rows: 1 0 0 1 4 0 0 1 0 4 4 1 0 0 1 4 2 0

and we're done. (It's probably worth multiplying the original A by the alleged A^{-1} to make sure that we didn't make a mistake.)

# Orthogonality

Two vectors x and y in a real vector space are called **orthogonal** if x⋅y = x^{T}y = 0. This is a symmetric relation, since y⋅x = y^{T}x = (x^{T}y)^{T} = (x⋅y)^{T} = x⋅y since y⋅x and x⋅y are both scalars (disguised as 1×1 matrices). In Euclidean spaces like our familiar ℝ^{3}, two vectors are orthogonal precisely if they are perpendicular: e.g. (0,0,1)⋅(1,1,0) = 0⋅1+0⋅1+1⋅0 = 0, so a vector sticking directly out of the page (if that's what our third coordinate represents) is orthogonal to a vector running diagonally across the page, just as we would suspect from the orthogonal = perpendicular rule. In other spaces defining orthogonality in this way is more confusing; for example, in (ℤ_{2})^{3} the vectors (1,1,1) and (0,1,1) are orthogonal since 1⋅0+1⋅1+1⋅1 = 0+1+1 = 0, but it's not clear how we would draw these to make them look like they have a right angle between them, and the vectors with similar-looking coordinates in ℝ^{3} are not orthogonal since their dot-product is 2.

An important property of orthogonality is that it is preserved by scalar multiplication and vector addition. This follows from the facts that (a**x**)⋅**y** = a(**x**⋅**y**) and **x**⋅(**y**+**z**) = **x**⋅**y** + **x**⋅**z**, which are simply restatements of the fact that x^{T} is a linear transformation from the vector space to its underlying field. A consequence of this fact is that the set of vectors that are orthogonal to a particular vector **x** form a subspace.

## Projections onto lines

Suppose that we can write a vector **z** = a**x** + **y**, where **y** is orthogonal to **x**. Then a**x** is called the **projection** of **z** onto the line (1-dimensional subspace) generated by **x**. The question arises: does such a projection always exist, and if it does, how do we find a?

The answer is that we take the dot-product of **z** with **x**. This gives **z**⋅**x** = a**x**⋅**x** + **y**⋅**x** = a**x**⋅**x** (since **y**⋅**x** = 0). Solving for a gives a = **z**⋅**x**/**x**⋅**x**, and it is immediate from this solution that a is in fact the unique scalar such that **z** = a**x** + **y** with **y**⋅**x** = 0.

Note that this only works if **x**⋅**x** is nonzero. For vector spaces over the rationals or the reals this is always the case when **x** itself is nonzero. Here **x**⋅**x** is just the square of the length of **x** as obtained by the Pythagorean theorem. But it may fail to be the case in spaces over fields with finite characteristic; for example, in ℤ_{2}^{3}, (0,1,1)⋅(0,1,1) = 0, and there is no projection of **z** = (1,0,1) onto the line generated by (0,1,1).^{2} It can also fail in vector spaces over the complex numbers if we do nothing special about the dot product: when **x** = (i, 1) ∈ ℂ^{2} then **x**⋅**x** = i^{2}+1^{2} = -1 + 1 = 0. For this reason the dot-product **x**⋅**y** in complex spaces is usually defined by first taking the **complex conjugate** of **x**: the vector obtained by substituting -i for each occurrence of i in **x**. This gives a more reasonable measure of length (now (i,1)⋅(i,1) = -i⋅i + 1 = 2) and in fact allows projections to be taken in such spaces. But the easier solution—which we will adopt—is simply to stick with real-valued vector spaces when we talk about dot products.

Here's a picture of projection in magnificent ASCII_art:

y z = ax + y ⋅------⋅ | /| | / | | / | | / | | / | |/ | ⋅------⋅-------⋅ 0 ax x

One interpretation of the projection of **z** onto **x** is that a is the **least-squares solution** to the problem a**x** = **z**; that is, it is the solution that minimizes the sum of the squares of the errors in each coordinate (which is just **y**⋅**y**, the square of the length of **y** = **z** - a**x**). This is useful for approximating the solution to inconsistent equations. For example, given the equations 2a = 3 and 3a = 4, we can rewrite them as a(2,3) = (3,4) and obtain a = (2,3)⋅(3,4)/(2,3)⋅(2,3) = (6+12)/(4+9) = 18/13. This is not actually a solution, since 2(18/13) = 2.76... is not 3 and 3(18/13) = 4.14... is not 4, but it's pretty close and it is impossible to do better without increasing the sum of the squares of the errors.

## Projections onto higher-dimensional subspaces

What if we want to project onto a subspace that's not a line? We can still do so, but it's helpful to first generate an **orthogonal basis** for the subspace. This is a basis (maximal set of independent vectors) that is orthogonal in the sense that any two basis elements **x**_{i} and **x**_{j} are orthogonal.

Suppose we have an orthogonal basis, and we want to write **z** = **y** + ∑_{i} a_{i}**x**_{i}, where **y** is orthogonal to all of the **x**_{i}. To find a particular a_{i}, observe that **y** + ∑_{j≠i} a_{j}**x**_{j} is orthogonal to **x**_{i}. So in fact **z** = a_{i}**x**_{i} + **y** + ∑_{j≠i} a_{j}**x**_{j} is just an instance of the problem of projecting onto a line. We've already solved that problem: a_{i} = **z**⋅**x**_{i}/**x**_{i}⋅**x**_{i}.

What if we don't have an orthogonal basis? Presumably we have some sort of basis, say **x**_{1}...**x**_{k}. If it's not orthogonal, we can make it orthogonal via projection: let **y**_{1} = **x**_{1}, and for each i > 1 let **y**_{i} be the solution to **x**_{i} = y_{i} + ∑_{j < i} a_{j} **y**_{j} where y_{i} is orthogonal to all the y_{j} for j < i. Now use the **y**_{i} as our orthogonal basis and project away.

(In real life, projection problems like this are solved by constructing a matrix A whose columns are the **x**_{i} vectors, and solving for **a** = (A^{T}A)^{-1}A^{T}**z**. The proof that this works is similar to the 1-dimensional case, and the resulting **a** gives the least-squares solution to the equation A**a** = **z**.)

# Further reading

Linear algebra is vitally important in ComputerScience: it is a key tool in graphics, scientific computing, robotics, neural networks, and many other areas. If you do further work in these areas, you will quickly find that we have not covered anywhere near enough linear algebra in this course. Your best strategy for remedying this deficiency may be to take an actual linear algebra course; failing that, a very approachable introductory text is *Linear Algebra and Its Applications*, by Gilbert Strang.

Some other useful books on linear algebra:

Golub and van Loan,

*Matrix Computations*. Picks up where Strang leaves off with practical issues in doing computation.Halmos,

*Finite-Dimensional Vector Spaces*. Good introduction to abstract linear algebra, i.e. properties of vector spaces without jumping directly to matrices.

Matlab (which is available on the Zoo machines: type `matlab` at a shell prompt) is useful for playing around with operations on matrices. There are also various non-commercial knockoffs like Scilab or Octave that are not as comprehensive as Matlab but are adequate for most purposes. Note that with any of these tools, if you find yourselves doing lots of numerical computation, it is a good idea to talk to a numerical analyst about round-off error: the floating-point numbers inside computers are not the same as real numbers, and if you aren't careful about how you use them you can get very strange answers.

The tedious details: to multiple row r by a, use a matrix B with B

_{ii}= 1 when i≠r, B_{rr}=a, and B_{ij}=0 for i≠j; to add a times row r to row s, use a matrix B with B_{ii}= 1 when i≠r, B_{rs}= a, and B_{ij}=0 for all other pairs ij; to swap rows r and s, use a matrix B with B_{ii}= 1 for i∉{r,s}, B_{rs}= B_{sr}= 1, and B_{ij}for all other pairs ij. (1)To check this, observe that the line generated by

**x**= (0,1,1) contains only (0,0,0) and (0,1,1). But**z**-(0,0,0) =**z**is not orthogonal to**x**, and neither is**z**- (0,1,1) = (1,1,0). (2)