View Question
 Dear MathTalk, Firstly, I refer to a question posted at: http://answers.google.com/answers/threadview?id=531867 You suggested that you had a simple proof of the continuity of the spectral radius with respect to t, in particular at 0 (I'm not interested in differentiability). I assume you show that the eigenvalues (roots of det(M_t - \lambda I) = 0 are continuous. Would you mind posting this? Also, I have a question along similar lines. Let M be a square matrix of size n. Let V be another square matrix of the same size n, where all the rows of V are given by the same row vector v of size n. How would one go about choosing vector v so that one might minimise the spectral radius of the matrix given by (M - V)? Many thanks, Dooogle Clarification of Question by dooogle-ga on 31 Jul 2005 06:41 PDT Dear MathTalk, If minimisation proves tricky, then, at the very least, I'd like a strategy for ensuring that the spectral radius of (M-V) is less than the spectral radius of M. Thanks again, Dooogle.
 Hi, dooogle-ga: Introduction ============ The continuous dependence of the roots (zeros) of a monic polynomial on its coefficients is a well-known result, often used without reference to a proof. Once this theorem is in hand, the continuity of the spectral radius of a matrix (in terms of its entries) is easily shown. An "elementary" (but perhaps not "simple") proof is given here: [The roots of a polynomial depend continuously on its coefficients by Raul Naulin and Carlos Pabst; Revista Colombiana de Matemáticas v.28(1) 1994] http://www.scm.org.co/revistas.php?modulo=MasInformacion&ver=635 (see links at bottom of page for PDF or DVI formats) Ostrowski (Acta Math. v.72, 1940) proved a more quantitative result, namely that: If P(z),Q(z) are (complex) monic polynomials of equal degree n, with roots x_1,...,x_n and y_1,...,y_n respectively, then (for some suitable ordering of the y_j's) for all j: | x_j - y_j | <= 4n * T * D^(1/n) where T >= 1 bounds the k'th roots of P & Q coefficients of z^(n-k), and where D is the 2-norm of the difference of P & Q, ie. the square root of the sum of squares of absolute values of the differences between corresponding coefficients. Away from multiple roots this estimate can be sharpened, as is shown in this paper: [How the Roots of a Polynomial Vary with its Coefficients: A Local Quantitative Result by Bernard Beauzamy; Canad. Math. Bull. v. 42(1) 1999] http://www.journals.cms.math.ca/cgi-bin/vault/public/view/beauzamy7281/body/PDF/beauzamy7281.pdf?file=beauzamy7281 * * * * * * * * * * * * * * * * * * * Detailed Sketch of Proof ======================== We give an outline of the proof due to Naulin and Pabst, linked above. A complex (monic) polynomial of degree n has (by the Fund. Thm. of Algebra) exactly n roots in the complex field, counted according to multiplicity: P(z) = z^n - c_1*z^(n-1) + ... + (-1)^n c_n = (z - r_1)*(z - r_2)*...*(z - r_n) We would like to define a 1-1 mapping between the n-tuples of coefficients and the n-tuples of roots, but a mechanical difficulty arises in that the order of the roots is not apriori specified, at least not in the obvious sense in which the order of the coefficients is. To manage this difficulty, Naulin and Pabst impose the lexicographic ordering on the complex numbers: a + bi < c + di iff a < c OR (a = c and b < d) They then restrict the n-tuples of complex roots so that: (r_1,r_2,...,r_n) in C_n implies r_1 <= r_2 <= ... <= r_n Every such n-tuple is the ordered roots of some (complex) monic polynomial. So we have a one-to-one correspondance between C_n, the possible ordered n-tuples of roots, and C^n, the complex n-tuples, considered as "non-leading" coefficients of monic polynomials of degree n. In fact the mapping from roots to coefficients is pretty explicit, taking into account the alternating signs on the c_i's we introduced above: c_1 = r_1 + ... + r_n c_2 = r_1*r_2 + r_1*r_3 + ... + r_{n-1}*r_n ... c_n = r_1 * r_2 * ... * r_n The expressions of the right-hand sides above are the elementary symmetric polynomials, and collectively this correspondance is known as the Vičte map (after Francois Vičte, 1540-1603, a pioneer in the theory of algebraic equations). Once they define a metric for C_n: n dist((r_1,...,r_n),(s_1,...,s_n)) = SQRT( SUM |r_i - s_i|^2 ) i=1 it can be seen that C_n is a complete metric space, a property inherited as a closed subspace of C^n, where the same metric is a Euclidean norm. For convenience we will refer to this norm as: ||a|| = ||(a_1,...,a_n)|| taking care that our context allows this for "vector" a without ambiguity. The continuity of the Vičte map T:C_n -> C^n given by the above formulas is then clear. The big task is then to show that the inverse of this bijective mapping, which they call S, is also continuous. Toward this end they produce a Lemma: dist(S((a_1,...,a_n)),(0,...,0)) <= 2n * max{1,||a||} whose proof, though easy, we shall not repeat here. They now have all the ingredients needed to show S:C^n -> C_n is continuous. For if not, then take (a_1,...,a_n) to be a point of discontinuity, so that for some delta > 0 and some sequence of "vector" n-tuples {c(k): k=1 to oo}, c(k) = (c_{k,1},...,c_{k,n}): +oo LIMIT c(k) = a k=1 and dist(S(c_k),S(a)) >= delta for all k. The Lemma above establishes that the image sequence {S(c(k)): k=1 to oo} is bounded, so that if necessary we can choose a subsequence of the c(k)'s such that the images S(c(k)) converge in the complete metric space C_n, say: +oo LIMIT S(c(k)) = r k=1 But the continuity of the Vičte map T then implies: +oo +oo +oo T(r) = T( LIMIT S(c(k) ) = LIMIT T(S(c(k)) = LIMIT c(k) = a k=1 k=1 k=1 and hence that r = S(a). By a standard argument from the convergence of subsequence c(k), this shows that for k sufficiently large: dist(S(c(k)),S(a)) < delta contradicting the original choice of sequence {c(k)} as never getting that close, and establishing the theorem (continuity of T). * * * * * * * * * * * * * * * * * * * Continuity of R(r) = max{|r_i|} on C_n ====================================== It may be helpful, both for its own relevance and as an illustration of how the topology defined on C_n "works", to explicitly prove that R(r) = max{|r_i|} is a continuous function from C_n to R as we've defined this metric topology on C_n. Let r = (r_1,...,r_n) and s = (s_1,...,s_n) be properly ordered n-tuples in C_n, and say: dist(r,s) = ||r - s|| > 0 Now for any component entries r_i and s_i, we have by the triangle inequality: |r_i| <= |s_i| + |r_i - s_i| <= |s_i| + ||r - s|| <= R(s) + dist(r,s) Thus taking maximum with respect to i, we get: R(r) <= R(s) + dist(r,s) and R(s) <= R(r) + dist(r,s) by a symmetric argument. So |R(r) - R(s)| <= dist(r,s), and it follows the function R is continuous on C_n. Note that despite knowing that the entries of r are lexicographically ordered in C, we lack any information about which component entry r_i yields the maximum absolute value R(r). Fortunately as the "norm" metric takes all components into account, this gap in our knowledge is no obstacle. * * * * * * * * * * * * * * * * * * * Continuity of the Spectral Radius of a Matrix ============================================= Finally we only need to observe that as the coefficients of the characteristic polynomial are a continuous function of a matrix M's entries, applying first T to the non-leading coefficients and then R will produce the spectral radius of M as a continuous function of those same entries. * * * * * * * * * * * * * * * * * * * Spectral Radius of a Special "Rank 1" Update to M ================================================= I'm not clear if any of the properties of M or even M_t (such as nonnegativity) from the other Question linked above were intended to carry over to the second part of this Question. Note that those properties were not required to Answer the first part. I will try to make clear what of those properties I think are useful now. The second part concerns the spectral radius of a matrix M-V where V has the form: (1,...,1)' * v for an arbitrary row vector v. The product of a column and a row vector is a matrix of rank 1, and the numerical linear algebra term for M-V is therefore a "rank 1 update" of M. As v can be any scalar multiple of an arbitrary vector of length 1, the probability is quite high that unless the spectral radius of M is already zero (M nilpotent), that there exists a scalar multiple of a randomly chosen "search direction" that will reduce the spectral radius somewhat. So let's focus first on what search directions we might like to avoid. For a "positive" matrix M_t such as constructed in the other Question, at least for t > 0, we are guaranteed to have a unique positive eigenvalue q equal to the spectral radius and of multiplicity 1. Let u' be a unit length "positive" column vector which is a "right" eigenvector for this eigenvalue. Then: (M-V)u' = Mu' - (1,...,1)'*vu' = qu' - (vu')(1,...,1)' Now if v were chosen orthogonal to u, then vu' = 0 and we can see from this calculation that no such rank 1 update of M would reduce the spectral radius (although taking the scalar multiple in v sufficiently large would doubtless increase the spectral radius beyond some point). If choosing a search direction v orthogonal to u is not helpful, we should perhaps next consider taking v = tu where t is a scalar multiple (though this is not the only possibility): (M-V)u' = Mu' - (1,...,1)'*vu' = qu' - (t,...,t)' This is not a completely clear cut result, because no longer is it evident or even likely that u' is a right eigenvector of the updated matrix (M-V). However since u has all entries positive and so does (t,...,t) for t > 0, it seems almost inescapable that for at least small values of t, the effect of this update is to reduce the spectral radius. Actually, since the maximum modulus eigenvalue for these matrices has multiplicity 1, we are in a situation in which (again, for small perturbations of M) the spectral radius will be differentiable with respect to the parameter t. Thus the only "bad" outcome would be that this derivative is zero, which seems to me almost untenable. If we are faced with a general matrix M, there need not be any search direction which can reduce the spectral radius. We have already pointed out the case M nilpotent (spectral radius zero) as one such possibility, but other cases where the maximum modulus eigenvalue has multiplicity greater than 1 will share this frustration. My suggestion, for additional research, is to focus on matrices that are positive and to do some numerical experiments with v = tu or (more generally) with v not orthogonal to u, to confirm if nothing else my intuition about the direction of the effect of t on the spectral radius sketched above. regards, mathtalk-ga Request for Answer Clarification by dooogle-ga on 01 Aug 2005 03:41 PDT Dear MathTalk, Many thanks. I've a had first read-through of your answer, and I'm very thankful indeed. You've been extremely helpful. Just in case you've any extra insights, thought I'd try sketch the problem area I'm considering. My concern is finding the unique solution (supposing this exists) of x = Mx, where M is any square matrix (not necessarily irreducible, and not necessarily non-negative). Now I'd like to do this in an iterative fashion: x_{t+1} = Mx_t. Now, unique solution of x = Mx does not entail that x_{t+1} = Mx_t converges. In a Markov setting, consider three states A, B, C. If A -> B and B -> A and C->C, then a unique stationary distrution exists, but we do not have necessary convergence to this stationary distribution. The spectral radius of M can give many insights into the solubuility of the equation of x = Mx in an iterative fashion, as it can into the rate of convergence if the equation is soluble. What I'd like to do is map the equation x = Mx to equation x = (M-V)x + v with a view to experiment with solubility and convergence rates. Many thanks again. Dooogle Request for Answer Clarification by dooogle-ga on 01 Aug 2005 04:20 PDT Addendum: In what appears the most fruitful line of inquiry, I'm trying to find v such that spectral radius of |(M-V)| is minimised (or at least reduced) - where | M - V | is the matrix which has as entries the absolute values of the corresponding entries in (M - V). So, as you suggested, it seems I ought really really to consider non-negative matrices. ;-) Request for Answer Clarification by dooogle-ga on 01 Aug 2005 04:22 PDT Please note: the t I use in the answer clarification is wholly independent of the t mentioned in the original question. Request for Answer Clarification by dooogle-ga on 02 Aug 2005 02:00 PDT Dear MathTalk, Wondering if you could please elaborate a little on the following? "This is not a completely clear cut result, because no longer is it evident or even likely that u' is a right eigenvector of the updated matrix (M-V). However since u has all entries positive and so does (t,...,t) for t > 0, it seems almost inescapable that for at least small values of t, the effect of this update is to reduce the spectral radius. Actually, since the maximum modulus eigenvalue for these matrices has multiplicity 1, we are in a situation in which (again, for small perturbations of M) the spectral radius will be differentiable with respect to the parameter t. Thus the only "bad" outcome would be that this derivative is zero, which seems to me almost untenable." In the first paragraph, I'm not clear on the importance of both u and (t,...,t) being positive. And, I'm not clear on why this leads to your inescapable specral radius reduction remark. In the second paragraph, I'm not clear on why, with multiplicity 1, we are in a situation where (for small perturbations of M) the spectral radius will be differentiabl with respect to parameter t. Thank you, Dooogle Clarification of Answer by mathtalk-ga on 02 Aug 2005 07:01 PDT Hi, Dooogle-ga: I'll be glad to fill out the details behind those cursory remarks. I wasn't too clear when I wrote them how closely tied your interests were to the Markov chain case. Now that I know there is still some connection there, I'll give more complete information; I'll try to give you something tonight. regards, mathtalk-ga Clarification of Answer by mathtalk-ga on 03 Aug 2005 05:57 PDT Recall that the Vičte map, which effectively gives a polynomials coefficients in terms of its roots, is defined by elementary symmetric polynomials (of the roots). Indeed, to recall our earlier notation, but to extend the map to all of C^n: c_1 = SUM r_i FOR 1 <= i <= n c_2 = SUM r_i * r_j FOR 1 <= i < j <= n ... c_n = PRODUCT r_i FOR 1 <= i <= n The k'th elementary symmetric polynomial of the roots r_i is the sum of all products of distinct k-subsets of the roots. Naturally this extension of S to all of C^n is a "smooth" (differentiable) function because of this polynomial definition. The issue at hand is the "smoothness" of the inverse. One way to discuss this is in terms of the inverse function theorem, which says that the inverse of S will be differentiable at a point where the Jacobian of S is nonzero, and indeed the "gradient" of the (functional) inverse of S is the (matrix) inverse of the "gradient" of S at that point: [Inverse Function Theorem - Explanation] http://www.ualberta.ca/dept/math/gauss/fcm/calculus/multvrbl/basic/ImplctFnctns/invrs_fnctn_explntn.htm The Jacobian of the mapping S (think change of variables in calculus) is the determinant of the matrix of first derivatives of components of S (coefficients) with respect to the underlying variables (roots), aka "gradient" of S. The function S has n components and n independent variables, so this is an nxn matrix (and it makes sense to speak of its determinant). The matrix is invertible if and only if its determinant is non zero. For this particular case of S, the Vičte map, the Jacobian turns out to have this value, given roots r_1,...,r_n: PRODUCT (r_i - r_j) FOR 1 <= i < j <= n which can be shown by a pretty straightforward induction, applying Gauss elimination to the gradient matrix (details on request). Clearly this is nonzero if and only if all the roots are distinct. In other words at a point where _all_ the roots are distinct, _all_ the "trajectories" of the roots as a function of the polynomials coefficients are smooth (differentiable), as a consequence of the inverse function theorem. One can prove more, that if a particular root is distinct from all the others, then its trajectory (as a function of the coefficients) is smooth, regardless of whether some other roots has multiplicity greater than 1. It is this sharper result to which I alluded before. An indication of why this is so can be found in the paper by Beauzamy (linked above). If the "inverse" function r_1(c_1,...,c_n), giving the "first" root as a function of the polynomial's (non-leading) coefficients, were smooth, then we would expect to have an estimate for the distance between two values x,y on the "root trajectory" in terms of the distance D between their respective coefficient arguments like: | x - y | <= C * D where C is a constant whose absolute value is determined by the derivative of the function r_1(c_1,...,c_n) with respect to those arguments. And this is precisely the kind of improved estimate that is given in that paper (for the case of one root being "isolated"). I think this discussion may be a bit too abstract to be immediately helpful, and so the other thing I'd like to do is to work through a small example. It may be that even a perturbation of a 2x2 matrix will provide useful insights. Also, recall the examples given (under Request for Clarification) on this thread: [Google Answers: Spectral Radius of a Matrix] http://answers.google.com/answers/threadview?id=531867 of some eigenvalue trajectories near points where we have a double eigenvalue. regards, mathtalk-ga Clarification of Answer by mathtalk-ga on 06 Aug 2005 11:05 PDT Just a quick note to point out some aspects of this problem that make it more amenable to numerical experiments than to a purely theoretical analysis. We have already discussed the nonlinear dependence (Vičte map) of the coefficients of a characteristic polynomial on its roots (matrix eigenvalues), and of course we are more interested in the inverse of this mapping (dependence of roots, esp. the eigenvalue of maximum modulus, on the coefficients). The complexity is increased by the nonlinear and combinatorial way in which the coefficients depend in turn on the matrix entries (via the determinant). This is not to say that the numerical experimentation cannot be guided by theoretical understanding, the foundation of which I tried to lay above. If we look at the "pencil" of matrices: M - cV = M - c*(1,...,1)'(v_1,...,v_n) with variable scalar c, then the spectral radius is (the absolute value of) the trajectory of the maximum modulus eigenvalue r(c) as a continuous function of c. We have seen that in a "nice" case such as a nonnegative matrix there will be a unique such eigenvalue, and that away from multiplicity > 1 the trajectory is not just continuous but differentiable with respect to c. Finding the maximum modulus eigenvalue, for example by a power method, is computationally tractable, and we can envision doing a numerical experiment to confirm that the effect of using (v_1,...,v_n) as the "left" eigenvector corresponding to that eigenvalue will produce a reduction in spectral radius. Of course while we are perturbing (hopefully decreasing) the maximum eigenvalue, we will at the same time be perturbing (and possibly increasing) the other eigenvalues, so we do not expect that an unlimited decrease in spectral radius is attainable (except in the special case that M is of rank one and a multiple of V, where of course we could drop the spectral radius all the way to zero). So it seems to me that experiments will help, not only to show we are thinking correctly about the general relationships, but to develop some hypothesis about how much work is needed to obtain a given reduction, and of course suggest further experiments with more sophisticated strategies. If you have in mind a particular matrix of modest dimensions, e.g. 10 by 10, that would be good to play with, I'd be happy to do some calculations. regards, mathtalk-ga