View Question
Q: linear and nonlinear problem solving, math ( Answered ,   2 Comments )
 Question
 Subject: linear and nonlinear problem solving, math Category: Science > Math Asked by: goodwasabi-ga List Price: \$10.00 Posted: 30 Mar 2004 09:38 PST Expires: 29 Apr 2004 10:38 PDT Question ID: 322431
 ```I would like an introduction to krylov spaces. what is a krylov space and what are it's properties?``` Clarification of Question by goodwasabi-ga on 30 Mar 2004 10:40 PST ```I know how to generate a Krylov space, but I don't understand why it would be the natural space to find a solution of a Ax=b type problem. Every text I have found on the internet defines the space as a span of A and x but they don't explain any general properties of the space, like how the various vectors x, Ax, A^2x... relate to each other. How can x, Ax, A^2x etc be linearly independent?``` Clarification of Question by goodwasabi-ga on 30 Mar 2004 23:50 PST ```I refer to the comment by mathtalk-ga My main interest is infact for the conjugate gradient method. What I have noticed is that most researchers, when dealing with these types of problems consider working in this type of space (Krylov) as a given. I understand what a Krylov space is and on the other side how the conjugate gradient method works, but if I don't see the clear relation between the two.``` Request for Question Clarification by mathtalk-ga on 31 Mar 2004 08:46 PST ```Hi, goodwasabi-ga: I think I can tie these concepts together for you. For many applications one has a matrix or linear operator that acts in a high dimensional space, where an exact linear solver can be prohibitively expensive (in terms of compute time). Iterative methods of approximate solution are then attractive. [Many nonlinear algorithms require iterative solution of linear subproblems, so there's a natural tendendancy here also to allow for good approximate solutions for the linear subproblems so long as the overall convergence of the nonlinear method is not badly affected.] The conjugate gradient method for positive definite matrix A has a curious attraction of being both an exact method and an iterative method. Instead of factoring the matrix A or (worse) inverting it, one solves Ax = b through a series of steps whose most expensive work is a matrix*vector multiplication. The "approximate solution" at any given stage can be understood as a projection (under an A-inner product space) of the actual solution upon an expanding set of Krylov subspaces. In n dimensions one has (assuming exact arithmetic) the exact solution because the Krylov subspaces have expanded at that point to fill the domain. Is this sort of information what you are looking for? I'd be happy to give a more technical explanation as an Answer, but if this is sufficient you are under no obligation. regards, mathtalk-ga``` Clarification of Question by goodwasabi-ga on 01 Apr 2004 00:57 PST ```Ciao mathtalk-ga, first of all I thank you very much for your two comments. I would like to have your technical answer. If you could step thru the CG method (even just briefly) just to explain when and how Krylov is envolved (also referring to what you were saying about the fact that it's usefull for the K-th term of the expansion to be linearly dependent on the others). Regards, goodwasabi-ga```
 Subject: Re: linear and nonlinear problem solving, math Answered By: mathtalk-ga on 03 Apr 2004 08:59 PST Rated:
 ```Hi, goodwasabi-ga: Your Question about the relationship between Krylov subspaces and iterative solution of linear systems brings to mind: An Old Story ============ One dark evening a man sees a friend, busily searching for something under a street lamp. When he asks how he can help, he's told that a watch is missing, and for several minutes both of them diligently look for it. Finally the man asks his friend if he's sure this is where the watch was lost. "Oh no," the friend replies, "I left it in the park, but the light is much better here." Certainly it is not obvious that a Krylov subspace would have the solution of a linear system, or even a good estimated solution. However it is easy to search there! And under some conditions the search turns out to be unreasonably successful. Introduction ============ Variations (preconditioners) and extensions (biconjugate gradient, conjugate gradient squared) of conjugate gradient methods abound. Let's describe the most basic conjugate gradient method. Originally proposed for minimizing a quadratic functional: Q(x) = 1/2 (x,Ax) - (x,b) where the notation ( , ) indicates the dot product of vectors, this method's minimization turns out to be equivalent for a symmetric positive definite matrix A to solving the linear system Ax = b. For simplicity, let's assume the initial estimate of the solution is zero: x_0 = 0 Despite appearances there's really no loss of generality. If a nonzero initial estimate x_0 were given, it would be precisely as if we took x = z + x_0: Az = b - Ax_0 = c and then proceeded to search for a solution z to that new system starting with z0 = 0. Conjugate Gradient Method ========================= Assign for i = 0: p_0 := r_0 := b - Ax_0 = b While ||r_i|| is greater than some tolerance value, let i' = i + 1 and assign: a_i := (r_i,p_i)/(p_i,Ap_i) [NB: a_i is a scalar.] x_i' := x_i + a_i p_i r_i' := r_i - a_i Ap_i d_i := (r_i,Ar_i)/(p_i,Ap_i) [NB: d_i is a scalar.] p_i' := r_i' - d_i p_i Relationship to Krylov Subspaces ================================ Here our Krylov subspaces are of the form span{b, Ab, A^2b, ... } for finitely many powers of A. Each conjugate gradient step involves getting: "residual error" r_i' from r_i by subtracting a scalar multiple of Ap_i "search direction" p_i' from r_i' by subtracting a scalar multiple of p_i Since p_0 = r_0 = b, it is straightforward to show by joint induction that: r_i and p_i belong to span{b, Ab,..., A^i b} Then, since x_i' is x_i plus a scalar multiple of p_i, and x_0 belongs to span{ }, another induction shows that: "estimated solution" x_i' belongs to span{b, Ab,...,A^i b} The Minimization Claim ====================== Assuming the use of exact arithmetic, what can be said of x_i' at each step is that it minimizes the quadratic functional: Q(x) = 1/2 (x,Ax) - (x,b) over the Krylov subspace span{b, Ab,..., A^i b}. The stage is eventually reached when Q(x) attains its global minimum, and then: Ax = b Minimization in the First Step ============================== Let's verify the minimization claim for the very first step, namely: x_1 = x_0 + a_0 p_0 = a_0 p_0 minimizes Q(x) over span{b}. Let scalar a be a free variable, and rewrite for x = ab: Q(x) = 1/2 (ab,A(ab)) - (ab,b) = a^2/2 (b,Ab) - a (b,b) This concave-up quadratic function's minimum can be easily found at the turning point: a (b,Ab) - (b,b) = 0 a = (b,b)/(b,Ab) = (r_0,p_0)/(p_0,Ap_0) Thus the first step of the conjugate gradient method gives the minimum x = ab of Q(x) over the space span{b}. In this sense the conjugate gradient method starts out just like th method of steepest descent. Many write-ups of the theory behind the conjugate gradient method will explain why the choices of scalars a_i and d_i guarantee the continued minimization of Q(x) over the "expanding" Krylov subspaces. In a nutshell what is wanted is for the residual r_i' (aka the negative gradient of Q at x_i') to be orthogonal to all the previous search directions p_0,...,p_i. The essential point to explore is that what makes the conjugate gradient method successful, more so than the method of steepest descent which it agrees with at the first step, is the choice of search directions which are not orthogonal in the usual inner product, but A-orthogonal. For example: (p_i',Ap_k) = 0 for k = 0 to i a condition on the search directions p_i that's also referred to as being "A-conjugate". Here are a couple of nice write-ups that go into details. [(Postscript) An Introduction to the Conjugate Gradient Method -- by Jonathan Richard Shewchuk (50 pages!)] http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.ps [(PDF) How To Get Conjugate Gradient Method In a Geometric Way -- by Mike Yan (only 8 pages, based on the above)] http://www.cs.ucdavis.edu/~bai/ECS231/finalyan.pdf When Independence Fails ======================= Finally we come to the question of what it means if the Krylov subspaces _stop_ expanding, i.e. if the set of vectors: {b, Ab,..., A^i b} becomes linearly dependent at some point. This can indeed happen, in spite of the fact that A is symmetric and positive definite. The first trivial illustration is when b = 0. The second illustration is when b is not zero, but A = I. But from the standpoint of searching for a solution to Ax = b, these are not pathological conditions. First, if b = 0, then our initial guess x_0 = 0 turns out to be correct! Second, if A = I, then x_1 = b turns out to be correct! In fact if step i is the first step at which A^i b is linearly dependent on previous vectors in the sequence, then we will have found a solution at that point. Suppose b is nonzero and: A^i b = SUM c_k A^k b [FOR 0 <= k < i] expresses A^i b as a linear combination of the previous vectors. Because i was the minimal stage at which this occurs, the coefficient c_0 cannot be zero. If it were, then we'd have (dropping the c_0 b term): A^i b = SUM c_k A^k b [FOR 1 <= k < i] and using the invertibility of A (positive definite implies invertible), we could cancel a power of A on both sides to get an "earlier" linear dependence relation. But if c_0 is not zero, then: c_0 b = A^i b - SUM c_k A^k b [FOR 1 <= k < i] and by dividing both sides by c_0, we get a solution of the form Ax = b (there's a power of A in each term on the right hand side). So the end of linear independence means the solution was found. Here's a more abstract way to think about this. Again focus on the case where b is nonzero and the sequence of Krylov subspaces: S_1 = span{b} S_2 = span{b, Ab} S_3 = span{b, Ab, ...}, etc. We have S_1 subset S_2 subset S_3 ... and also A(S_1) subset S_2; A(S_2) subset S_3; etc. If the underlying sequence stops producing linearly independent vectors, then S_i is S_i'. Consequently S_i is A-invariant, or A(S_i) subset S_i. As A is invertible and S_i is finite dimensional, so A(S_i) = S_i by virtue of equal dimensions. But then because b belongs to S_i, we must also have A^-1 b in S_i. The end of linear independence is also something of a happy accident for eigenvalue problems. As above we obtain an A-invariant Krylov subspace S, potentially much smaller than the full domain of A. It is well known that a symmetric matrix A has a complete basis of eigenvectors. The search for eigenvalue/eigenvector pairs of A can then be divided up into searching over S and over the orthogonal complement of S: S' = { u | (u,v) = 0 for all v in S } which turns out to be A-invariant as well. Of course this pigeonholing of eigenvalue/eigenvector pairs can turn out to be something of a mixed blessing. As in the old story with which we began, one can search diligently but unproductively if the particular eigenvalue sought doesn't fall into the Krylov subspace being constructed. Rounding errors of floating point arithmetic can at times mitigate this difficulty, by blurring the boundaries of the invariant subspaces. Still, it's always best to have a theoretical justification for expecting our searches to have a reasonable likelihood of success. regards, mathtalk-ga```
 goodwasabi-ga rated this answer: `Excellent answer from a great researcher. Very clear and to the point. Bravo.`

 ```Hi, Given a vector x and a matrix A, the Krylov space of A is the subspace spanned by x, A*x, A^2*x, .... read detailed here about krylov space, and krylov space methods http://www.ccr.buffalo.edu/class-notes/hpc2-00/odes/node6.html#SECTION00051000000000000000 http://sigda.org/Archives/ProceedingArchives/Date/papers/1999/date99/pdffiles/10b_3.pdf http://meyer.math.ncsu.edu/Meyer/PS_Files/Krylov.pdf regards, s3com```
 ```Krylov subspaces are named after Russian mathematician AN Krylov, which should be pronounced like "kri-LOFF" according to the short biographical note here: http://www.math.uu.nl/people/vorst/kryl.html As you seem to be aware, a Krylov subspace is one of the form: span { v, Av, A^2 v, ... } for some finite number of terms. It is sometimes quite helpful if after a certain number of "power" iterations, the kth term A^k v is _not_ linearly independent from the preceding terms. That is, if: A^k v = SUM a_i A^i v FOR i = 0 to k-1 then the subspace of the domain (say R^n where A is an n by n matrix) spanned by these first k terms is _invariant_ under multiplication by A. Consequently it may be possible, in searching for eigenvalues or other kinds of linear algebra solutions, to restrict attention to the "reduced order" subspace of dimension k (or possibly less) rather than tackling the entire n-dimensional setting. A wide variety of numerical linear algorithms can be developed in connection with this. I notice the subject line says "linear and nonlinear problem solving", so perhaps there's a specific application in mind. If not, one quite interesting family of algorithms are called "conjugate gradient methods". The linear side of solving systems of equations is often a stepping stone to solving nonlinear problems by minimization (for example, making a residual error small). best wishes, mathtalk-ga```