Hi, oaaltone-ga:
Sorry for the delay in posting this tonight, but we had an extended
power outtage due to severe weather.
Here are my notes on Part 1, intermingled with the problem statement.
I have adapted the notation for clarity in this text based medium,
using capital letters only for matrices, lowercase letters in a "upper
range" like u,v,w for vectors, and lowercase letters toward the
beginning of the alphabet like a,b,c for scalars. Subscripts are
shown by an underline character "_" and superscripts, esp. "perp" for
the orthogonal complement, in combination with "^" hat characters.
I will immediately post answers to Part 2 as a follow-up.
regards, mathtalk-ga
Part 1.
Let A be an n by n matrix. We say that subspace E of R^n is an
A-subspace iff:
u in E implies Au in E
For convenience we shall refer to the orthogonal complement of E,
E^perp, by F, so that the direct sum E + F = R^n. Note that there are
always at least two trivial A-subspaces, {0} and R^n, assuming n > 0.
We will denote the Euclidean inner product on R^n between vectors u,v
by u*v, where vectors are identified with single column matrices.
Another notational convenience will be A' for the transpose of A. Note
that by the properties of transpose and matrix multiplication:
u*v = u'v
Au * v = u'A'v = u * A'v
1. Show that if E is an A-subspace of R^n, then F, the orthogonal
complement of E in R^n, is an A'-subspace.
Proof: Let u belong to F. We are asked to show this implies A'u
belongs to F. For this it suffices to show that A'u is orthogonal to
each vector in E.
Let v belong to E. Now:
A'u * v = u * A''u = u * Av = 0
since Av is in E and u belongs to the orthogonal complement F. Thus
A'u is again in F, which proves that F is an A'-subspace.
2. Let {b_1,...,b_r} be a basis for A-subspace E.
Let {b_{r+1},...,b_n} be a basis for F, the orthogonal complement of
E. By a standard construction the union of these is a basis for R^n.
Then the matrix B = [b_1 | b_2 | ... | b_n] is a "change of basis"
matrix which translates from "B" coordinates to standard coordinates.
Assuming that A is symmetric, A = A', show that the similarity
transformation B^-1 A B produces a block-diagonal result:
C 0
0 D
where C is r by r and D is (n-r) by (n-r).
Proof: Let {e_1,e_2,...,e_n} be the standard basis of unit vectors in
R^n. One way to elucidate the structure of M = B^-1 A B is to "probe"
the columns of M by forming matrix-vector products M e_k, which yields
the k'th column of M.
In particular we will show that:
(i) If 1 <= k <= r, then M e_k = (c_{1,k},...,c_{r,k},0,...0)' for
some real constants c_{1,k},...,c_{r,k}.
(ii) If r+1 <= k <= n, then M e_k = (0,...,0,d{1,k},...,d{n-r,k})' for
some real constants d_{1,k},...,d_{n-r,k}.
From these two demonstrations it follows that M = B^-1 A B has the
block-diagonal structure claimed for it, with C = [c_{i,j}], D =
[d_{i,j}].
Now B e_k = b_k by virtue of our construction of B above, and
similarly:
B^-1 b_k = e_k
Let's tackle case (i), in which k <= r. Thus b_k is in A-subspace E,
so that A b_k is again in E. Therefore it can be uniquely expressed
aa a linear combination of the first r basis elements b_1,...,b_r:
A b_k = c_{1,k} b_1 + ... + c_{r,k} b_r
Thus M e_k = B^-1 A B e_k = B^-1 A b_k, and:
M e_k = B^-1 (c_{1,k} b_1 + ... + c_{r,k} b_r)
= c_{1,k} B^-1 b_1 + ... + c_{r,k} B^-1 b_r
= c_{1,k} e_1 + ... + c_{r,k} e_r
= (c_{1,k},...,c_{r,k},0,...,0)'
just as we desired to show.
The case (ii) is precisely the same sort of calculation involving the
last (n-r) elements of the basis formed by the columns of B, once it
is acknowledged that F is an A-subspace too.
The only point that needs to be carefully examined is the fact that F
is also an A-subspace. But this follows from the symmetry of A = A'
and our work in the previous step where we showed that F is an
A'-subspace. [Recall from the example considered in the request for
clarification that if A were not symmetric, F could be an A'-subspace
but _not_ an A-subspace.]
3. Show that in 2., if B is orthogonal and A is symmetric, then C and
D are symmetric also.
Proof: The symmetry of block submatrices C,D is equivalent to the
symmetry of M overall, since:
-- --
M' = | C' 0 |
| 0 D'|
-- --
so that M = M' if and only if both C = C' and D = D'.
Now the definition of an orthogonal matrix is that B^-1 = B', the
inverse is the transpose. So a simple computation, using the
orthogonality of B and the symmetry of A, gives that:
M = B^-1 A B = B' A B
M' = (B' A B)' = B' A' B = B' A B = M
Thus M is symmetric under the given hypotheses.
Underlined quotation:
We assume that if A is symmetric n by n matrix and
E not {0} is an A-subspace, then A has a (unit)
eigenvector in E.
[Note: The insertion of this assumption at this particular point
seems a little strange. As we have seen, the symmetry of A was
required for all steps in this problem after the first. On the other
hand no symmetry assumption or assumption about existence of a unit
eigenvector is needed in step 4 following, if the notation N(A) is
properly understand as "nullspace of A".]
4. Show, using Gram-Schmidt, that N(A) has an orthonormal basis,
{u_1,...,u_s}.
Proof: Any (sub)space has a basis, so suppose that the nullspace of
A:
N(A) = { v | Av = 0 }
has basis {v_1,...,v_s}.
The Gram-Schmidt construction applies to any inner-product space. To
kick it off one simply normalizes the first vector of the (ordered)
basis:
u_1 = (1/|v_1|) v_1
where we denote the norm (length) of a vector v by |v|.
We process subsequent basis vectors v_k by projecting them onto to the
subbasis formed by previous output vectors u_1,...,u_{k-1}, obtaining
a "residual" r_k:
r_k = v_k - (v_k*u_1)u_1 - ... - (v_k*u_{k-1})u_{k-1}
Because of the linear independence of the basis vectors v_k, noting
that u_1,...,u_{k-1} are linear combinations of v_1,...,v_{k-1}, we
know r_k is not zero. Therefore we can normalize r_k to get u_k:
u_k = (1/|r_k|) r_k
Keep this up. The resulting sequence of vectors:
u_1, u_2, ... , u_s
are normalized (unit length) and mutually orthogonal by their
construction. For example, r_k above is perpendicular (orthogonal) to
each u_i, i < k, because of the projection used to define r_k, with
u_i being already orthogonal to each u_j with j not equal to i:
r_k*u_i = v_k*u_i - 0 ... - (v_k*u_i)(u_i*u_i) ... - 0
= v_k*u_i - (v_k*u_i)
because after normalization u_i*u_i = 1.
Therefore the Gram-Schmidt process converts the ordered basis {v_k}
into a linearly independent orthonormal sequence {u_k} of equal size,
which must then also be a spanning set and basis by the invariance of
the dimension of a vector space.
[Alternatively one could explicitly show that each v_k can be
expressed as a linear combination of the vectors u_1,...,u_k, so that
the spanning property of the vectors u_1,...,u_s is inherited from
that of the vectors v_1,...,v_s.]
Underlined quotation:
From now on, we assume that A is symmetric.
[Note: We must add to the statement of the next step the condition
that |u| = 1.]
5. Suppose that A has rank r and u is a _unit_ eigenvector of A with
nonzero eigenvalue a. Show that A - auu' is symmetric, and that it
has rank at most r-1. (Use a previous exercise.)
Proof: If A has rank r, then the nullity of A, the dimension of its
nullspace, must be (n-r). This is often proven by elementary row
operations and counting the number of leading ones in a reduced row
echelon form of A. We return to this observation at the end, possibly
what was meant by "a previous exercise".
Since A and auu' = (auu')' are both symmetric, it is clear that A -
auu' is symmetric.
We want to show that the nullspace of A - auu' is strictly larger than
the nullspace of A. To begin with let's show that the nullspace of A
is contained in the nullspace of A - auu'. For this we need a
"lemma":
If Av = 0, and Au = au with a nonzero, then u*v = 0.
Pf. of "lemma":
au*v = Au*v = u*Av = u*0 = 0 (using symmetry of A)
Then since a is nonzero, u*v = 0.
Therefore with this lemma, any vector v belonging to the nullspace of
A will again belong to the nullspace of A - auu':
Av = 0 implies
(A - auu')v = Av - auu'v = Av - 0u = 0
But also u, which is not in the nullspace of A (why?), belongs to the
nullspace of A - auu'. Here the assumption that u has unit norm comes
into play, because u'u = 1:
(A - auu')u = Au - au = au - au = 0
Since the dimension of the nullspace of A - auu' is therefore strictly
larger than that of A, we deduce from rank + nullity = n that the rank
of A - auu' is less than rank of A, which is r. Thus rank of A - auu'
is at most r-1. QED
[Note: The general thrust of steps 6 and 7 is to extend the result of
step 5 by induction to obtain A - SUM a_i u_i (u_i)' with rank 0, i.e.
to represent A as the sum of such symmetric rank one updates as
indicated. My presentation tries to emphasize this continuity in the
thread of ideas.]
6. Suppose that we have found (unit) orthonormal eigenvectors
u_{s+1},...,u_{s+k} with non-zero eigenvalues a_{s+1},...,a_{s+k}
respectively, so that the matrix:
A_k = A - SUM a_i u_i (u_i)' FOR i = s+1 to s+k
has rank at most r-k. Show that A_k is symmetric, that N(A_k) is an
"A_r" -subspace [sic], and that if A_k is not zero then A_k has a unit
eigenvector u_{s+k+1} orthogonal to N(A_k).
Proof: The symmetry of A_k follows easily from the same argument
offered in step 5, that A and each individual subtrahend a_i u_i
(u_i)' are symmetric.
The claim that N(A_k) is an "A_r"-subspace is a bit hard to
understand. It is of course true that N(A_k) is an "A_k"-subspace;
that merely reflects the trivial observation that every vector in
N(A_k) is mapped to zero by A_k and so is trivially back in N(A_k).
Furthermore, since the construction we have already explored in step 5
provides for the strict inclusion of N(A_k) in N(A_m) for any m > k
later in the induction proof, it follows also that N(A_k) is an
"A_r"-space for the same trivial reason.
Of greater difficulty would be showing that N(A_k) is an A-space. I
will not pursue this tangent at present, but if it turns out to be of
interest, you are welcome to request a clarification on that issue.
Finally let's suppose A_k is nonzero. Then from the symmetry of A_k,
as shown in the proof of step 2, it follows that not only is N(A_k) an
A_k-space, its orthogonal complement N(A_k)^perp is also an A_k-space.
From the fact that A_k is nonzero it follows that N(A_k) is not all
of R^n, and thus that N(A_k)^perp is more than just {0}. We are
therefore justified in applying the underline assumption given between
steps 3 and 4 to the effect that A_k has a unit eigenvector in
N(A_k)^perp.
That we label such a unit eigenvector of A_k as u_{s+k+1} is perhaps
more significant than may first appear, for as we are engaged in an
induction proof, there is an implied obligation to show that u_{s+k+1}
is an eigenvector of the original matrix A and corresponds there to a
nonzero eigenvalue a_{s+k+1}.
But u_{s+k+1} is orthogonal to each of u_1,...,u_{s+k} by virtue of
these latter vectors belonging to N(A_k). Hence in particular:
a_i u_i (u_i)' u_{s+k+1} = 0 for each i < s+k+1
and therefore:
A u_{s+k+1} = A_k u_{s+k+1} = a_{s+k+1} u_{s+k+1}
QED
7. Show that u_1,...,u_{s+k+1} are orthonormal, and that
rank(A_{k+1}) is at most r-k-1, and that each matrix u_i (u_i)' is a
projection of rank 1. Conclude that A_n = 0 [sic] and that:
A = SUM a_i u_i (u_i)' FOR i = 1 to n
where a_1 = ... = a_s = 0 and a_{s+1},...,a_n nonzero, and that if:
B = (u_1 | ... | u_n)
then B^-1 A B is a diagonal matrix with diagonal entries a_1,..., a_n.
This result is called the "spectral theorem".
Proof:
Recall that s = nullity of A = n - rank of A = n-r. We have
previously shown the construction of u_1,...,u_s as an orthonormal
basis for N(A) by the Gram-Schmidt process, and have shown the
extension of the orthogonality property by the remaining unit vectors
above in the proof of step 6 in connection with the status of
u_{s+k+1} as an eigenvector for A.
Therefore the sequence of vectors u_1,...,u_n forms an orthonormal
basis, and the matrix B obtained by concatenating these columns is an
orthogonal matrix.
Note that as u_i (u_i)' is clearly of rank 1 since u_i is nonzero, we
need only show that it is also a projection, i.e. squaring this matrix
gives itself:
u_i (u_i)' u_i (u_i)' = u_i (u_i)'
But this is a direct application of the associativity of matrix
multiplication and the unit length of u_i:
(u_i)' u_i = 1
We have also shown in step 6, citing the argument used in step 5, that
the sequence of nullspaces of A = A_0, A_1, ... , A_r is strictly
increasing. Since the nullity of A is s, it follows that the nullity
of A_{k+1} is at least s + k + 1, so rank of A_{k+1} is at most n -
(s+k+1) = r - k - 1.
Indeed, since the rank of A_r is then at most 0 and also at least 0,
we have that A_r = 0 (and incidentally that all of the rank
inequalities must be tight, i.e. matched to equalities).
But if A_r = A - SUM a_i u_i (u_i)' = 0, then:
A = SUM a_i u_i (u_i)'
as desired.
It remains only demonstrate the diagonality of B^-1 A B and the
correspondance of its diagonal elements with the specified eigenvalues
a_1,...,a_n. But the structure of B^-1 A B is constrained by steps 2
and 3 and the induction process to have a block diagonal "shape" in
which the incremental blocks after the first s zeroes are 1 by 1
blocks with nonzero entries. If you like we can recall the
computation of step 2, in which standard basis vectors e_i are used to
"test" the contents of the entries of B^-1 A B. Here B e_i = u_i has
been shown to be an eigenvector of A with:
A u_i = a_i u_i
for i = 1 to n.
We must point out that the notation developed in step 6 would make it
proper here to speak of A_r = 0 rather than A_n = 0 as step 7 has
denoted it. But there can be little doubt of the author's intention;
the notational "error" appears motivated by the related indexing of
the eigenvalues and eigenvectors of A.
QED
The name "spectral theorem" sounds a little fancy, perhaps. The term
spectrum has come to be identified with the eigenvalues of a linear
operator, or more generally the scalars a for which a linear operator
(A - aI) fails to be invertible. The result here for matrices is a
"dressed down" version of a more sophisticated theorem about "bounded
normal" linear operators on a (possibly infinite dimensional) Hilbert
space. A common feature is the "representation" of the linear
operator as a sum (or integral) involving projection operators.
Search Strategy
personal knowledge of material |