Clarification of Answer by
mathtalk-ga
on
15 Aug 2004 08:52 PDT
This is an experiment in writing with a Unicode text editor,
which allows us the use of Greek letters, some superscripts,
and a few other familiar mathematical symbols.
Perron-Frobenius Theorem
========================
The proof below is based on one given by James M. Ortega in
Thm. 6.1.4 of this book:
[Matrix Theory: A Second Course]
http://www.riskbook.com/titles/ortega_j_(1987).htm
It's elementary in the sense of not introducing any theory
of complex functions or fixed-point results requiring a
topological foundation but does depend on such introductory
principles of real analysis as bounded sequences in finite
dimensional vector spaces having convergent subsequences.
For the sake of consistency with our earlier analysis, the
proof has been adapted to use the 1-norm ||.|| as well as
row-vector-on-the-left multiplication. The changes are not
substantial however.
Recall that u = [1 1 ? 1] denotes the vector of 1's. We'll
also use a slightly nonstandard notation that for row vector
x, |x| means the entry-by-entry absolute values (giving again
a row vector of the same length). Thus |u| = u, for example.
Previously we've defined when a probability transition matrix
is irreducible. The generalization of this to nonnegative
square matrices is slight but worthy of careful statement.
Defn. Nonnegative n×n matrix A is reducible iff there exist
1 ? i,j ? n s.t. for all a > 0, the row i, column j entry of
A to the power a is zero:
Aª(i,j) = 0
Otherwise (if for all i,j there exists power a > 0 such that
Aª(i,j) > 0), we say A is irreducible. Note that in general
the required power a depends on the indices i,j.
For a probability transition matrix, Aª(i,j) > 0 means there
is a positive probability of going from state i to state j in
exactly a steps, presuming a ? 0 is a whole number.
Note that if some row (resp. some column) of A is zero, that
same row (resp. same column) of any power Aª is again zero.
Hence such matrices are reducible.
The proof makes repeated use of (one half of) the following:
Lemma 1: Let A be an n×n nonnegative matrix, n > 1.
Then A is irreducible if and only if (I + A)??¹ > 0.
Proof: We will only use the forward direction, so naturally
let's start with proving that. Expanding (I + A)??¹ as a
binomial (because I commutes with A), we see that the terms
all involve nonnegative matrices. The leading term I takes
care of all the diagonal terms being positive, so we only
need to worry about off-diagonal entries.
Let i ? j give a row and column respectively of (I + A)??¹.
From the definition of irreducible we know that Aª(i,j) > 0
for some a > 0. If we could show this with a < n, we'd be
done as this would give a positive contribution to the same
entry in (I + A)??¹.
The expansion of Aª(i,j) gives a sum of all products:
A(k(0),k(1)*A(k(1),k(2))*...*A(k(a-1),k(a))
such that k(0) = i and k(a) = j. First remove any diagonal
terms A(k,k) from the product, thereby shortening the "path"
from i to j by some possible amount. More generally remove
any subproduct that "revisits" a previous index, e.g.
A(k,k')A(k',k")...A(_,k)
wherever it may occur in the product. Since i ? j we don't
remove the entire product.
Now we have product in which all the indexes are distinct,
and i ? j implies there are at most n-1 factors (since a
factors introduce a+1 indexes). So we've shown that for
i ? j it is possible to find Aª(i,j) > 0 with 0 < a < n.
That fills in the off-diagonal entries of (I + A)??¹ with
positive values as outlined above.
For the sake of completeness let's also prove the reverse
implication. As we've just analyzed, a positive entry in
(I + A)??¹ at off-diagonal position (i,j) means that:
Aª(i,j) > 0
for some 0 < a < n. It remains to argue that Aª(i,i) > 0
for some a > 0. But since n > 1, we can pick j different
from i. Combine a power of A that has (i,j) entry positive
with one that has (j,i) entry positive, and the product (or
sum of the two powers) gives diagonal (i,i) entry positive.
QED
Remark: I added the qualification n > 1 partly to avoid doubt
over the trivial case A = [0]. Arguably this is irreducible
in the sense that A° = [1], but we did not include such sense
in our definition. Despite this it's easy to verify that the
conclusions of the Perron-Frobenius theorem apply in the 1×1
case to nonnegative A, and we'll not belabor the point. Also
we will see from examples later that Aª(i,i) > 0 may require
a = n, which relates to our filling of off-diagonal entries.
We will also require at one point the following fact:
Lemma 2: Let b(i), i = 1,?,n be complex numbers such that:
| ? b(i) | = ? |b(i)| (sums over i = 1,?,n)
Then there exists a unit complex number c with |c| = 1 s.t.
b(i) = c*|b(i)| for all i = 1,?,n.
Proof: This says equality holds in the triangle inequality
only if all the complex numbers have the same direction c.
The converse is true but not stated here, and easy to show
by direct calculation. Our proof is by induction.
To shorten the proof a bit, let's assume without loss of
generality that all the summands b(i) are nonzero, since
for any zero summand the choice 0 = c*|0| doesn't matter.
The case n = 1 is simple. We choose c = b(1)/|b(1)|.
The case n = 2 contains all the hard work. Suppose:
| b(1) + b(2) | = |b(1)| + |b(2)|
As we assume neither of b(1) or b(2) is zero, divide through
by |b(2)| and label z = b(1)/b(2):
| z + 1 | = |z| + 1
Square both sides and use the fact that multiplying complex
numbers by their conjugates gives the absolute value squared:
_ _
(z + 1)(z + 1) = zz + 2|z| + 1
After clearing common terms from both sides we have:
_
z + z = 2|z|
Re(z) = |z|
which implies Im(z) = 0, else |z| > |Re(z)|.
Therefore z = |z| and we can choose:
c = b(1)/|b(1)| = z b(2)/|z b(2)| = b(2)/|b(2)|
Now the induction step. Assume the proposition has been shown
for some value n ? 2 and consider the situation when new term
b(n+1) is introduced. By the triangle inequality:
| b(n+1) + ? b(i) | ? |b(n+1)| + | ? b(i) |
? |b(n+1)| + ? |b(i)|
But our hypothesis for n+1 is that the first and last of these
expressions are equal, which forces equality in the middle:
|b(n+1)| + | ? b(i) | = |b(n+1)| + ? |b(i)|
| ? b(i) | = ? |b(i)| (sums over i = 1,?,n)
So the induction hypothesis applies to the first n terms, and
there exists a complex number c such that:
|c| = 1 and b(i) = c|b(i)| for all i = 1,?,n
Write p = ? |b(i)| (sum over i = 1,?,n), and also:
cp = ? c|b(i)| = ? b(i)
Therefore | b(n+1) + cp | = |b(n+1)| + |cp|,
and from the case n = 2 we deduce that:
c = cp/|cp| = b(n+1)/|b(n+1)|
Now we're done with the induction step.
QED
Theorem (Perron-Frobenius)
Let A be an irreducible n×n nonnegative matrix.
Then the largest absolute value of an eigenvalue of A is itself
a simple eigenvalue of A with an associated positive eigenvector.
Also, if any row (resp. any column) of A is positive, then all
other eigenvalues of A are strictly smaller in absolute value
than the largest one.
Proof: Let S = { ? > 0 : for some nonzero x ? 0, xA ? ?x }.
Since A is irreducible, no column of A is zero. Thus uA > 0,
and by taking ? to be the smallest entry of uA, we have:
uA ? ?u
Thus ? belongs to S, so S is nonempty. Moreover for any ? in S,
there exists nonzero x ? 0 such that:
xA ? ?x
Consequently ? ||x|| ? ||xA|| ? ||x|| * C where:
C = max { ||A(i,-)|| : 1 ? i ? n }
is the operator norm on multiplication by A induced by the 1-norm
on row vectors x. This proves that set S is bounded.
Therefore S has a least upper bound ?º. Let {?_k} be a sequence
of values in S converging to ?º with corresponding nonzero vectors
{x_k} normalized so that ||x_k|| = 1. Then a subsequence of {x_k}
exists which converges to vector xº with ||xº|| = 1. Since:
x_k A ? ?_k x_k
it follows that:
xº A ? ?º xº
We claim that equality actually holds, for the proof of which we
make our first use of Lemma 1.
Consider y = xº A - ?º xº = xº (A - ?ºI) ? 0. If y = 0, we are
done with our claim.
Otherwise, if y ? 0, then y (I + A)??¹ > 0 because any nonzero
entry in y corresponds to a positive row in (I + A)??¹.
If we set w = xº (I + A)??¹, then w > 0 (again because a nonzero
entry in xº meets a positive row in (I + A)??¹). Thus:
wA - ?ºw = w (A - ?ºI)
= xº (I + A)??¹ (A - ?ºI)
= xº (A - ?ºI) (I + A)??¹
= y (I + A)??¹
> 0
where use has been made of how A and (I + A)??¹ commute.
This quickly leads to a contradiction of ?º being the least upper
bound of S. Let ? > 0 be the minimum ratio of respective entries
in wA - ?ºw to those in w > 0, so that:
wA - ?ºw ? ?w
wA ? (?º + ?)w
By this contradiction the claim that y = 0, and thus xº A = ?º xº,
is established. Now ||xº|| = 1, so clearly ?º is an eigenvalue of
A of maximum absolute value.
Also, xº must have at least one positive entry, and because the
corresponding row in (I + A)??¹ is positive:
xº (I + A)??¹ > 0
I.e. (1 + ?º) xº > 0, and thus xº > 0 is strictly positive.
The simplicity of the eigenvalue is a bit tedious, but bear with
me. If the geometric multiplicity of ?º were more than one, then
we'd have eigenvector z corresponding to ?º linearly independent
of xº. Since ?º is real (and A is real), we may take z real.
By linear independence all combinations xº + ?z are eigenvectors
of A corresponding to ?º for any real ?. Perturbing xº with ?z
using positive or negative scalar ? as necessary, we could obtain
xº + ?z ? 0
with at least one entry equal to zero. But the above proof that
xº > 0 applies equally to xº + ?z, and we have a contradiction.
This establishes that the geometric multiplicity of ?º is 1, but
we must also rule out any algebraic multiplicity greater than 1,
i.e. that ?º appears in a Jordan block of size m > 1.
If the latter were the case, then there'd be vector v such that:
v (A - ?ºI)² = 0, v (A - ?ºI) ? 0
and again the fact A and ?º are real allows us to "solve" for v
to be real as well.
Since v (A - ?ºI) is an eigenvector associated with ?º, it must
by what has been shown so far be a nonzero multiple of xº.
Conversely we could write, using the reciprocal of that scalar:
xº = ?v (A - ?ºI) > 0
But this implies:
?v A > ?º(?v)
and if we take |?v| to be the entry-by-entry absolute value of
?v, then by the triangle inequality applied column-wise to A:
|?v|A > ?º|?v|
which gives us a contradiction to maximality of ?º in S.
So eigenvalue ?º has both algebraic and geometric multiplicity
1, i.e. a simple eigenvalue.
It remains to show that no other eigenvalue has an equal modulus
or absolute value as ?º. For this we need to assume more than
just the irreducibility of A, as can be illustrated by taking A
to be any cyclic permutation matrix, so that A? = I is a minimal
polynomial. All the eigenvalues (characteristic roots) fall on
the unit circle, so ?º = 1 is not strictly greater in absolute
value than the other eigenvalues.
For our purpose we'll analyze the case that A has at least one
column of all positive entries. The dual case of A with a row
of all positive entries then follows as previously noted in our
Comments by considering the equal "spectrum" (eigenvalues) of
transpose A'. [See Remark below for how it is usually phrased
in connection with Markov chains, which gives slightly greater
generality.]
Suppose for the sake of contradiction a different eigenvalue ?
with |?| = ?º. Let nonzero x be an associated eigenvector:
xA = ?x
Now ?º|x| = |?x| ? |x| A by the triangle inequality once more.
Repeat the argument leading from xº A ? ?ºxº to xº A = ?ºxº
but applied to |x|, and we have that ?º|x| = |x|A and so |x|
is a multiple of xº, say |x| = ?xº where ? > 0.
We further claim that |xA| = |x| A, since by the above:
|xA| = |?x| = ?º|x| = |x|A
Given that say the k'th column of A is positive, this means:
| ? x(i)*a(i,k) | = ? |x(i)|*a(i,k)
where the indicated sums are over i = 1,?,n.
This equality implies by Lemma 2 there exists complex c with
|c| = 1 such that:
x(i)*a(i,k) = c*|x(i)|*a(i,k)
Then because each a(i,k) ? 0, x = c|x| = c?xº. Since x is a
multiple of xº, we have shown eigenvalue ? = ?º.
QED
Remark: In the context of Markov chains, one often asks
that the probability transition matrix A is both irreducible
and _not_ "periodic". Periodic means that for the chance
of going from state i back to state i to be positive, the
number of steps needed is always divisible by some integer
t > 1. This condition is closely related to the positive
row or column condition that we've used in our statement of
the Perron-Frobenius theorem. It can be shown that if A is
irreducible and not periodic, then Aª > 0 for some power a
of A. Hence our assumption of a positive row or column is
then applicable to the power of A, with the result that all
conclusions remain valid. Likewise one can show inductively
that if any row (or column) of irreducible nonnegative n×n
matrix A is positive, then A? > 0. Ortega gives this as an
Exercise 6.1.10, calling nonnegative matrix A "primitive"
if Aª > 0 for some power a of A. The difficulty we see if
A is a cyclic permutation matrix is precisely that there is
no common power a for which all the entries of Aª > 0, and
indeed we reach A? = I only to start over. In these terms
we could state the half of Lemma 1 we used repeatedly as
A irreducible implies (I+A) primitive.
Acceleration methods
====================
We already had occasion to mention this paper:
[The Second Eigenvalue of the Google Matrix]
http://www.stanford.edu/~taherh/papers/secondeigenvalue.pdf
in which Taher H. Haveliwala and Sepandar D. Kamvar give an
explicit bound |?| ? c, where c is the parameter appearing
the the PageRank matrix definition, on the "second" (and
smaller) eigenvalue(s) of A.
Knowing a good estimate for the linear rate of convergence
of the power method permits Aitken extrapolation. Authors
of the above paper are joined by two more Stanford Univ.
colleagues in exploring variations on this algorithm here:
[Extrapolation Methods for Accelerating PageRank Computations]
http://www2003.org/cdrom/papers/refereed/p270/kamvar-270-xhtml/
A different approach to accelerating convergence has been
proposed by:
[Updating PageRank with Iterative Aggregation]
(by Amy N. Langville and Carl D. Meyer)
http://www.www2004.org/proceedings/docs/2p392.pdf
"ABSTRACT We present an algorithm for updating the PageRank
vector [1]. Due to the scale of the web, Google only updates
its famous PageRank vector on a monthly basis. However, the Web
changes much more frequently. Drastically speeding the PageRank
computation can lead to fresher, more accurate rankings of the
webpages retrieved by search engines. It can also make the goal
of real-time personalized rankings within reach. On two small
subsets of the web, our algorithm updates PageRank using just
25% and 14%, respectively, of the time required by the original
PageRank algorithm. Our algorithm uses iterative aggregation
techniques [7, 8] to focus on the slow-converging states of the
Markov chain. The most exciting feature of this algorithm is
that it can be joined with other PageRank acceleration methods,
such as the dangling node lumpability algorithm [6], quadratic
extrapolation [4], and adaptive PageRank [3], to realize even
greater speedups (potentially a factor of 60 or more speedup
when all algorithms are combined)."
Some analysis of the convergence properties of the IAD method
is provided here:
[Convergence Analysis of an Improved Pagerank Algorithm]
(by Ilse C.F. Ipsen and Steve Kirkland)
http://www4.ncsu.edu/~ipsen/ps/tr04-02.pdf
"ABSTRACT The iterative aggregation/disaggregation (IAD) method
is an improvement of the PageRank algorithm used by the search
engine Google to compute stationary probabilities of very large
Markov chains.
"In this paper the convergence, in exact arithmetic, of the IAD
method is analyzed. The IAD method is expressed as the power
method preconditioned by an incomplete LU factorization. This
leads to a simple derivation of the asymptotic convergence rate
of the IAD method.
"It is shown that the power method applied to the Google matrix
always converges, and that the convergence rate of the IAD method
is at least as good as that of the power method. Furthermore, by
exploiting the hyperlink structure of the web it can be shown
that the convergence rate of the IAD method applied to the Google
matrix can be made strictly faster than that of the power method."
regards, mathtalk-ga