Hi, goodwasabi-ga:
Your Question about the relationship between Krylov subspaces and
iterative solution of linear systems brings to mind:
An Old Story
============
One dark evening a man sees a friend, busily searching for something
under a street lamp. When he asks how he can help, he's told that a
watch is missing, and for several minutes both of them diligently look
for it. Finally the man asks his friend if he's sure this is where
the watch was lost. "Oh no," the friend replies, "I left it in the
park, but the light is much better here."
Certainly it is not obvious that a Krylov subspace would have the
solution of a linear system, or even a good estimated solution.
However it is easy to search there! And under some conditions the
search turns out to be unreasonably successful.
Introduction
============
Variations (preconditioners) and extensions (biconjugate gradient,
conjugate gradient squared) of conjugate gradient methods abound.
Let's describe the most basic conjugate gradient method. Originally
proposed for minimizing a quadratic functional:
Q(x) = 1/2 (x,Ax) - (x,b)
where the notation ( , ) indicates the dot product of vectors, this
method's minimization turns out to be equivalent for a symmetric
positive definite matrix A to solving the linear system Ax = b.
For simplicity, let's assume the initial estimate of the solution is zero:
x_0 = 0
Despite appearances there's really no loss of generality. If a
nonzero initial estimate x_0 were given, it would be precisely as if
we took x = z + x_0:
Az = b - Ax_0 = c
and then proceeded to search for a solution z to that new system
starting with z0 = 0.
Conjugate Gradient Method
=========================
Assign for i = 0:
p_0 := r_0 := b - Ax_0 = b
While ||r_i|| is greater than some tolerance value, let i' = i + 1 and assign:
a_i := (r_i,p_i)/(p_i,Ap_i) [NB: a_i is a scalar.]
x_i' := x_i + a_i p_i
r_i' := r_i - a_i Ap_i
d_i := (r_i,Ar_i)/(p_i,Ap_i) [NB: d_i is a scalar.]
p_i' := r_i' - d_i p_i
Relationship to Krylov Subspaces
================================
Here our Krylov subspaces are of the form span{b, Ab, A^2b, ... } for
finitely many powers of A.
Each conjugate gradient step involves getting:
"residual error"
r_i' from r_i by subtracting a scalar multiple of Ap_i
"search direction"
p_i' from r_i' by subtracting a scalar multiple of p_i
Since p_0 = r_0 = b, it is straightforward to show by joint induction that:
r_i and p_i belong to span{b, Ab,..., A^i b}
Then, since x_i' is x_i plus a scalar multiple of p_i, and x_0 belongs
to span{ }, another induction shows that:
"estimated solution"
x_i' belongs to span{b, Ab,...,A^i b}
The Minimization Claim
======================
Assuming the use of exact arithmetic, what can be said of x_i' at each
step is that it minimizes the quadratic functional:
Q(x) = 1/2 (x,Ax) - (x,b)
over the Krylov subspace span{b, Ab,..., A^i b}. The stage is
eventually reached when Q(x) attains its global minimum, and then:
Ax = b
Minimization in the First Step
==============================
Let's verify the minimization claim for the very first step, namely:
x_1 = x_0 + a_0 p_0 = a_0 p_0
minimizes Q(x) over span{b}. Let scalar a be a free variable, and
rewrite for x = ab:
Q(x) = 1/2 (ab,A(ab)) - (ab,b)
= a^2/2 (b,Ab) - a (b,b)
This concave-up quadratic function's minimum can be easily found at
the turning point:
a (b,Ab) - (b,b) = 0
a = (b,b)/(b,Ab) = (r_0,p_0)/(p_0,Ap_0)
Thus the first step of the conjugate gradient method gives the minimum
x = ab of Q(x) over the space span{b}. In this sense the conjugate
gradient method starts out just like th method of steepest descent.
Many write-ups of the theory behind the conjugate gradient method will
explain why the choices of scalars a_i and d_i guarantee the continued
minimization of Q(x) over the "expanding" Krylov subspaces. In a
nutshell what is wanted is for the residual r_i' (aka the negative
gradient of Q at x_i') to be orthogonal to all the previous search
directions p_0,...,p_i.
The essential point to explore is that what makes the conjugate
gradient method successful, more so than the method of steepest
descent which it agrees with at the first step, is the choice of
search directions which are not orthogonal in the usual inner product,
but A-orthogonal. For example:
(p_i',Ap_k) = 0 for k = 0 to i
a condition on the search directions p_i that's also referred to as
being "A-conjugate".
Here are a couple of nice write-ups that go into details.
[(Postscript) An Introduction to the Conjugate Gradient Method
-- by Jonathan Richard Shewchuk (50 pages!)]
http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.ps
[(PDF) How To Get Conjugate Gradient Method In a Geometric Way
-- by Mike Yan (only 8 pages, based on the above)]
http://www.cs.ucdavis.edu/~bai/ECS231/finalyan.pdf
When Independence Fails
=======================
Finally we come to the question of what it means if the Krylov
subspaces _stop_ expanding, i.e. if the set of vectors:
{b, Ab,..., A^i b}
becomes linearly dependent at some point.
This can indeed happen, in spite of the fact that A is symmetric and
positive definite. The first trivial illustration is when b = 0. The
second illustration is when b is not zero, but A = I.
But from the standpoint of searching for a solution to Ax = b, these
are not pathological conditions. First, if b = 0, then our initial
guess x_0 = 0 turns out to be correct! Second, if A = I, then x_1 = b
turns out to be correct!
In fact if step i is the first step at which A^i b is linearly
dependent on previous vectors in the sequence, then we will have found
a solution at that point. Suppose b is nonzero and:
A^i b = SUM c_k A^k b [FOR 0 <= k < i]
expresses A^i b as a linear combination of the previous vectors.
Because i was the minimal stage at which this occurs, the coefficient
c_0 cannot be zero. If it were, then we'd have (dropping the c_0 b
term):
A^i b = SUM c_k A^k b [FOR 1 <= k < i]
and using the invertibility of A (positive definite implies
invertible), we could cancel a power of A on both sides to get an
"earlier" linear dependence relation.
But if c_0 is not zero, then:
c_0 b = A^i b - SUM c_k A^k b [FOR 1 <= k < i]
and by dividing both sides by c_0, we get a solution of the form Ax =
b (there's a power of A in each term on the right hand side).
So the end of linear independence means the solution was found.
Here's a more abstract way to think about this.
Again focus on the case where b is nonzero and the sequence of Krylov subspaces:
S_1 = span{b}
S_2 = span{b, Ab}
S_3 = span{b, Ab, ...}, etc.
We have S_1 subset S_2 subset S_3 ...
and also A(S_1) subset S_2; A(S_2) subset S_3; etc.
If the underlying sequence stops producing linearly independent
vectors, then S_i is S_i'.
Consequently S_i is A-invariant, or A(S_i) subset S_i.
As A is invertible and S_i is finite dimensional, so A(S_i) = S_i by
virtue of equal dimensions.
But then because b belongs to S_i, we must also have A^-1 b in S_i.
The end of linear independence is also something of a happy accident
for eigenvalue problems. As above we obtain an A-invariant Krylov
subspace S, potentially much smaller than the full domain of A. It is
well known that a symmetric matrix A has a complete basis of
eigenvectors. The search for eigenvalue/eigenvector pairs of A can
then be divided up into searching over S and over the orthogonal
complement of S:
S' = { u | (u,v) = 0 for all v in S }
which turns out to be A-invariant as well.
Of course this pigeonholing of eigenvalue/eigenvector pairs can turn
out to be something of a mixed blessing. As in the old story with
which we began, one can search diligently but unproductively if the
particular eigenvalue sought doesn't fall into the Krylov subspace
being constructed.
Rounding errors of floating point arithmetic can at times mitigate
this difficulty, by blurring the boundaries of the invariant
subspaces.
Still, it's always best to have a theoretical justification for
expecting our searches to have a reasonable likelihood of success.
regards, mathtalk-ga |