Google Answers Logo
View Question
Q: linear and nonlinear problem solving, math ( Answered 5 out of 5 stars,   2 Comments )
Subject: linear and nonlinear problem solving, math
Category: Science > Math
Asked by: goodwasabi-ga
List Price: $10.00
Posted: 30 Mar 2004 09:38 PST
Expires: 29 Apr 2004 10:38 PDT
Question ID: 322431
I would like an introduction to krylov spaces. what is a krylov space
and what are it's properties?

Clarification of Question by goodwasabi-ga on 30 Mar 2004 10:40 PST
I know how to generate a Krylov space, but I don't understand why it
would be the natural space to find a solution of a Ax=b type problem.
Every text I have found on the internet defines the space as a span of
A and x but they don't explain any general properties of the space,
like how the various vectors x, Ax, A^2x... relate to each other. How
can x, Ax, A^2x etc be linearly independent?

Clarification of Question by goodwasabi-ga on 30 Mar 2004 23:50 PST
I refer to the comment by mathtalk-ga

My main interest is infact for the conjugate gradient method. What I
have noticed is that most researchers, when dealing with these types
of problems consider working in this type of space (Krylov) as a
given. I understand what a Krylov space is and on the other side how
the conjugate gradient method works, but if I don't see the clear
relation between the two.

Request for Question Clarification by mathtalk-ga on 31 Mar 2004 08:46 PST
Hi, goodwasabi-ga:

I think I can tie these concepts together for you.  For many
applications one has a matrix or linear operator that acts in a high
dimensional space, where an exact linear solver can be prohibitively
expensive (in terms of compute time).  Iterative methods of
approximate solution are then attractive.  [Many nonlinear algorithms
require iterative solution of linear subproblems, so there's a natural
tendendancy here also to allow for good approximate solutions for the
linear subproblems so long as the overall convergence of the nonlinear
method is not badly affected.]

The conjugate gradient method for positive definite matrix A has a
curious attraction of being both an exact method and an iterative
method.  Instead of factoring the matrix A or (worse) inverting it,
one solves Ax = b through a series of steps whose most expensive work
is a matrix*vector multiplication.  The "approximate solution" at any
given stage can be understood as a projection (under an A-inner
product space) of the actual solution upon an expanding set of Krylov
subspaces.  In n dimensions one has (assuming exact arithmetic) the
exact solution because the Krylov subspaces have expanded at that
point to fill the domain.

Is this sort of information what you are looking for?  I'd be happy to
give a more technical explanation as an Answer, but if this is
sufficient you are under no obligation.

regards, mathtalk-ga

Clarification of Question by goodwasabi-ga on 01 Apr 2004 00:57 PST
Ciao mathtalk-ga,

first of all I thank you very much for your two comments. I would like
to have your technical answer. If you could step thru the CG method
(even just briefly) just to explain when and how Krylov is envolved
(also referring to what you were saying about the fact that it's
usefull for the K-th term of the expansion to be linearly dependent on
the others).
Subject: Re: linear and nonlinear problem solving, math
Answered By: mathtalk-ga on 03 Apr 2004 08:59 PST
Rated:5 out of 5 stars
Hi, goodwasabi-ga:

Your Question about the relationship between Krylov subspaces and
iterative solution of linear systems brings to mind:

An Old Story

One dark evening a man sees a friend, busily searching for something
under a street lamp.  When he asks how he can help, he's told that a
watch is missing, and for several minutes both of them diligently look
for it.  Finally the man asks his friend if he's sure this is where
the watch was lost.  "Oh no," the friend replies, "I left it in the
park, but the light is much better here."

Certainly it is not obvious that a Krylov subspace would have the
solution of a linear system, or even a good estimated solution. 
However it is easy to search there!  And under some conditions the
search turns out to be unreasonably successful.


Variations (preconditioners) and extensions (biconjugate gradient,
conjugate gradient squared) of conjugate gradient methods abound.

Let's describe the most basic conjugate gradient method.  Originally
proposed for minimizing a quadratic functional:

Q(x) = 1/2 (x,Ax) - (x,b)

where the notation ( , ) indicates the dot product of vectors, this
method's minimization turns out to be equivalent for a symmetric
positive definite matrix A to solving the linear system Ax = b.

For simplicity, let's assume the initial estimate of the solution is zero:

x_0 = 0

Despite appearances there's really no loss of generality.  If a
nonzero initial estimate x_0 were given, it would be precisely as if
we took x = z + x_0:

Az = b - Ax_0 = c

and then proceeded to search for a solution z to that new system
starting with z0 = 0.

Conjugate Gradient Method

Assign for i = 0:

p_0 := r_0 := b - Ax_0 = b

While ||r_i|| is greater than some tolerance value, let i' = i + 1 and assign:

a_i := (r_i,p_i)/(p_i,Ap_i)  [NB: a_i is a scalar.]

x_i' := x_i + a_i p_i

r_i' := r_i - a_i Ap_i

d_i := (r_i,Ar_i)/(p_i,Ap_i) [NB: d_i is a scalar.]

p_i' := r_i' - d_i p_i

Relationship to Krylov Subspaces

Here our Krylov subspaces are of the form span{b, Ab, A^2b, ... } for
finitely many powers of A.

Each conjugate gradient step involves getting:

"residual error"
    r_i' from r_i by subtracting a scalar multiple of Ap_i

"search direction"
    p_i' from r_i' by subtracting a scalar multiple of p_i

Since p_0 = r_0 = b, it is straightforward to show by joint induction that:

    r_i and p_i belong to span{b, Ab,..., A^i b}

Then, since x_i' is x_i plus a scalar multiple of p_i, and x_0 belongs
to span{ }, another induction shows that:

"estimated solution"
    x_i' belongs to span{b, Ab,...,A^i b}
The Minimization Claim

Assuming the use of exact arithmetic, what can be said of x_i' at each
step is that it minimizes the quadratic functional:

Q(x) = 1/2 (x,Ax) - (x,b)

over the Krylov subspace span{b, Ab,..., A^i b}.  The stage is
eventually reached when Q(x) attains its global minimum, and then:

Ax = b

Minimization in the First Step

Let's verify the minimization claim for the very first step, namely:

x_1 = x_0 + a_0 p_0 = a_0 p_0

minimizes Q(x) over span{b}.  Let scalar a be a free variable, and
rewrite for x = ab:

Q(x) = 1/2 (ab,A(ab)) - (ab,b)

     = a^2/2 (b,Ab) - a (b,b)

This concave-up quadratic function's minimum can be easily found at
the turning point:

a (b,Ab) - (b,b) = 0

a = (b,b)/(b,Ab) = (r_0,p_0)/(p_0,Ap_0)

Thus the first step of the conjugate gradient method gives the minimum
x = ab of Q(x) over the space span{b}.  In this sense the conjugate
gradient method starts out just like th method of steepest descent.

Many write-ups of the theory behind the conjugate gradient method will
explain why the choices of scalars a_i and d_i guarantee the continued
minimization of Q(x) over the "expanding" Krylov subspaces.  In a
nutshell what is wanted is for the residual r_i' (aka the negative
gradient of Q at x_i') to be orthogonal to all the previous search
directions p_0,...,p_i.

The essential point to explore is that what makes the conjugate
gradient method successful, more so than the method of steepest
descent which it agrees with at the first step, is the choice of
search directions which are not orthogonal in the usual inner product,
but A-orthogonal.  For example:

(p_i',Ap_k) = 0  for k = 0 to i

a condition on the search directions p_i that's also referred to as
being "A-conjugate".

Here are a couple of nice write-ups that go into details.

[(Postscript) An Introduction to the Conjugate Gradient Method
 -- by Jonathan Richard Shewchuk (50 pages!)]

[(PDF) How To Get Conjugate Gradient Method In a Geometric Way 
 -- by Mike Yan (only 8 pages, based on the above)]

When Independence Fails

Finally we come to the question of what it means if the Krylov
subspaces _stop_ expanding, i.e. if the set of vectors:

{b, Ab,..., A^i b}

becomes linearly dependent at some point.

This can indeed happen, in spite of the fact that A is symmetric and
positive definite.  The first trivial illustration is when b = 0.  The
second illustration is when b is not zero, but A = I.

But from the standpoint of searching for a solution to Ax = b, these
are not pathological conditions.  First, if b = 0, then our initial
guess x_0 = 0 turns out to be correct!  Second, if A = I, then x_1 = b
turns out to be correct!

In fact if step i is the first step at which A^i b is linearly
dependent on previous vectors in the sequence, then we will have found
a solution at that point.  Suppose b is nonzero and:

A^i b = SUM c_k A^k b [FOR 0 <= k < i]

expresses A^i b as a linear combination of the previous vectors. 
Because i was the minimal stage at which this occurs, the coefficient
c_0 cannot be zero.  If it were, then we'd have (dropping the c_0 b

A^i b = SUM c_k A^k b [FOR 1 <= k < i]

and using the invertibility of A (positive definite implies
invertible), we could cancel a power of A on both sides to get an
"earlier" linear dependence relation.

But if c_0 is not zero, then:

c_0 b = A^i b - SUM c_k A^k b [FOR 1 <= k < i]

and by dividing both sides by c_0, we get a solution of the form Ax =
b (there's a power of A in each term on the right hand side).

So the end of linear independence means the solution was found. 
Here's a more abstract way to think about this.

Again focus on the case where b is nonzero and the sequence of Krylov subspaces:

S_1 = span{b}

S_2 = span{b, Ab}

S_3 = span{b, Ab, ...}, etc.

We have S_1 subset S_2 subset S_3 ...

and also A(S_1) subset S_2; A(S_2) subset S_3; etc.

If the underlying sequence stops producing linearly independent
vectors, then S_i is S_i'.

Consequently S_i is A-invariant, or A(S_i) subset S_i.

As A is invertible and S_i is finite dimensional, so A(S_i) = S_i by
virtue of equal dimensions.

But then because b belongs to S_i, we must also have A^-1 b in S_i.

The end of linear independence is also something of a happy accident
for eigenvalue problems.  As above we obtain an A-invariant Krylov
subspace S, potentially much smaller than the full domain of A.  It is
well known that a symmetric matrix A has a complete basis of
eigenvectors.  The search for eigenvalue/eigenvector pairs of A can
then be divided up into searching over S and over the orthogonal
complement of S:

S' = { u | (u,v) = 0 for all v in S }

which turns out to be A-invariant as well.

Of course this pigeonholing of eigenvalue/eigenvector pairs can turn
out to be something of a mixed blessing.  As in the old story with
which we began, one can search diligently but unproductively if the
particular eigenvalue sought doesn't fall into the Krylov subspace
being constructed.

Rounding errors of floating point arithmetic can at times mitigate
this difficulty, by blurring the boundaries of the invariant

Still, it's always best to have a theoretical justification for
expecting our searches to have a reasonable likelihood of success.

regards, mathtalk-ga
goodwasabi-ga rated this answer:5 out of 5 stars
Excellent answer from a great researcher. Very clear and to the point. Bravo.

Subject: Re: linear and nonlinear problem solving, math
From: s3com-ga on 30 Mar 2004 10:04 PST

Given a vector x and a matrix A, the Krylov space of A is the subspace
spanned by x, A*x, A^2*x, ....

read detailed here about krylov space, and krylov space methods

regards, s3com
Subject: Re: linear and nonlinear problem solving, math
From: mathtalk-ga on 30 Mar 2004 22:17 PST
Krylov subspaces are named after Russian mathematician AN Krylov,
which should be pronounced like "kri-LOFF" according to the short
biographical note here:

As you seem to be aware, a Krylov subspace is one of the form:

span { v, Av, A^2 v, ... }

for some finite number of terms.  It is sometimes quite helpful if
after a certain number of "power" iterations, the kth term A^k v is
_not_ linearly independent from the preceding terms.   That is, if:

A^k v = SUM a_i A^i v FOR i = 0 to k-1

then the subspace of the domain (say R^n where A is an n by n matrix)
spanned by these first k terms is _invariant_ under multiplication by
A.  Consequently it may be possible, in searching for eigenvalues or
other kinds of linear algebra solutions, to restrict attention to the
"reduced order" subspace of dimension k (or possibly less) rather than
tackling the entire n-dimensional setting.

A wide variety of numerical linear algorithms can be developed in
connection with this.  I notice the subject line says "linear and
nonlinear problem solving", so perhaps there's a specific application
in mind.  If not, one quite interesting family of algorithms are
called "conjugate gradient methods".  The linear side of solving
systems of equations is often a stepping stone to solving nonlinear
problems by minimization (for example, making a residual error small).

best wishes, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy