Hi, upstartaudioga:
Calculus is a difficult clase to master, either from the teacher's or
the student's perspective. Its utility pushes it of necessity to the
beginning of an undergraduate curriculum, but thorough understanding
of rigorous foundations are ordinarily deferred to later.
The generalized binomial theorem may be deduced as a special case of
Taylor series, for powers x^r where r is other than a nonnegative
integer.
So a key development is the definition of such a function x^r for
general exponents, satisfactory enough to use to deduce its
derivatives. Because the power rule:
d(x^r)/dx = r * x^(r1)
states the first derivative of x^r in simple terms of a constant times
a similar function, establishing the form of the first derivative
yields by induction the forms of all higher derivatives. The big
distinction between nonnegative integer r and other exponents is that
in that former case, the higher derivatives must eventually descend to
zero, so that a Taylor series expansion (regardless of the choice of
"center") will have finitely many terms if and only if the exponent r
is a nonnegative integer.
Here's an outline of the steps we will follow:
1. Recap of properties of function x^r and its inverse for integer r
2. Differentiability of the r'th root function (inverse of x^r)
3. Welldefinedness of x^r when r = p/q is a ratio of nonzero integers
4. Limits of real exponents as Cauchy sequences of rational exponents
regards,
mathtalkga 
Clarification of Answer by
mathtalkga
on
15 Jan 2005 22:37 PST
1. Previously functions f(x) = x^r and its inverse g(x) were shown
for integer exponents r to have the following properties on the domain
of positive real x:
(i) The function f(x) = x^r is positive, and:
strictly monotone increasing if r > 0,
constantly 1 if r = 0, and
strictly monotone decreasing if r < 0.
(ii) The function f(x) = x^r is continuous, and provided
r nonzero, g(x) is continuous (on the same domain).
(iii) The function f(x) = x^r is differentiable, and:
f'(x) = r * x^(r1).
This third property is precisely the power rule we seek to establish
for all r, but so far we've only proved it for integers. Recall that
we took an approach of proving the rule for positive r, then deducing
the rule for negative r. The same technique can be applied when r is
not an integer, leaving us free to concentrate on results for positive
r (where the functions x^r will again be monotone increasing).
The immediate open question is the differentiability of g(x), to which
we next turn.
2. A simple version of the Inverse Function Theorem suffices for our needs.
* * * * * * * * * * * * * * * * * * * * * *
Thm. (Inverse Function Theorem) Suppose f:[a,b]>[c,d] is a real
function with a positive derivative f' at each point of [a,b]. Then
its inverse g:[c,d]>[a,b] exists and has a positive derivative g' as
well (at each point of [c,d]), and the following is true:
g'(f(x)) = 1/f'(x)
for all x in [a,b]. Correspondingly:
g'(y) = 1/f'(g(y))
for all y in [c,d].
* * * * * * * * * * * * * * * * * * * * * *
Before we launch into the proof, let's see the implication of this for
the function f(x) = x^r. For integer r > 0, f:[a,b]>[a^r,b^r] has a
positive derivative at any point. Therefore its inverse does, and:
g'(f(x)) = 1/f'(x)
= 1/(r * x^(r1))
and if we take x to be g(y), the r'th root of y, then:
g'(y) = 1/(r * (g(y))*(r1))
If we were to allow ourselves to write g(y) = y^(1/r), that notation
plus the laws of exponents would lead to (1/r)*( y^((1/r)1) ).
However this formal acknowledgement of the power rule for y^(1/r) is a
bit premature. For the moment we are content to claim only that our g
is differentiable and has a positive derivative.
* * * * * * * * * * * * * * * * * * * * * *
Proof of Theorem:
Since g(f(x)) = x, if we knew that g was differentiable, the desired
consequence could be obtained by applying the chain. But since we
want to prove g is differentiable, we should drill down the definition
of g' as a limit:
g(y+h)  g(y)
g'(y) = limit 
h > 0 h
We can analyze this limit by introducing a sequence {h_i} of real
numbers that tend to 0, and produce a corresponding sequence {x_i} by
defining:
x_i = g(y + h_i)
for any fixed y in the domain. The endpoints y = c,d require slightly
special handling, as the derivatives, etc. at these points involve
onesided limits. But these are adequately handled by taking the
sequence {h_i} to approach zero from above for y = c, and from below
for y = d, so that {x_i} approaches a from above (resp. b from below).
Apart from this detail the arguments for the onesided derivatives at
the endpoints are the same as the twosided limit argument we about to
detail for the interior point cases.
Since y + h_i = f(x_i) by definition of x_i and f(g(y)) = y, we can
conclude that the limit can be rewritten:
g(y+h_i)  g(y) x_i  x
limit  = limit 
h_i > 0 h_i x_i > 0 f(x_i)  f(x)
which we recognize to be the reciprocal of the limit defining f'(x):
f(x_i)  f(x)
f'(x) = limit 
x_i > 0 x_i  x
Thus, provided y = f(x), we know the latter positive limit guarantees
the limit exists for its positive reciprocal, and the two limits are
reciprocals:
g'(y) = 1 / f'(x) = 1/f'(g(y)).
QED
* * * * * * * * * * * * * * * * * * * * * *
This proves the Theorem and in turn suffices to establish the
differentiability of g, the inverse of f(x) = x^r, at least for
positive integers r (so that both f and g are increasing functions).
It is therefore certain, by the Chain Rule for instance, that if we
were to define x^r for some ratio r = p/q of two positive integers,
that the composition:
f_p( g_q(x) )
where f_p(x) = x^p and g_q(x) is the inverse of f_q(x) = x^q, the
composition would be differentiable. In fact, bearing in mind that we
assume p.q > 0, the composition would be a differentiable,
monotoneincreasing continuous function that is continuously (even
differentiably) invertible by montone increasing:
f_q( g_p(x) )
Before we allow ourselves the luxury of restating these as a form of
the Power Rule, it is important to justify the notations:
f_p( g_q(x) ) = x^(p/q)
f_q( g_p(x) ) = x^(q/p)
and the "rational exponent" extensions of the laws of exponents in particular.
This then is the purpose of our next discussion.
(to be continued  mathtalkga)

Request for Answer Clarification by
upstartaudioga
on
16 Jan 2005 16:08 PST
A final comment might be in order. We've shown the validity of the
power rule without invoking the binomial theorem, and demonstrated
that it holds for the rational and real cases as long as x >= 0.
One of my textbooks ignores, while the other covers, the fact that x^r
is not defined when x < 0, therefore the function isn't differentiable
in that case.
To prove it is undefined, we need only notice that if r = a/b, then it
is also equal to (2*a) / (2*b) or (3*a) / 3*b). Now if b is odd, then
x^r is an even root raised to a power, and if b is even, then x^r is
an odd root raised to a power. So if x is negative, one of these
cases isn't defined in the set of real numbers.
The problem doesn't seem to go away with x complex, either, since
there are infinitely many solutions if r is the limit of an irrational
number. Even if we choose one of these roots by definition, we still
have the ugly situation that the (qth root of x) quantity raised to
the power p is not the same number as the qth root of x^p.
So I don't see that we can get around restricting the domain of x to
positive numbers. But that is the subject of another question...

Clarification of Answer by
mathtalkga
on
17 Jan 2005 21:27 PST
3. We have shown that given rational r = p/q where p,q are
positive integers (so far), the function:
f_p( g_q(x) )
where f_p(x) = x^p and g_q(x) is the inverse of x^q, has many
properties (monotonicity, continuity, differentiability) in
common with the powers x^r for integer r.
But because there are multiple ways to express any rational r
as a ratio of two integers, we need to show that all possible
choices lead to the same function x^r.
The mathematical shorthand says we need to show that:
x^r = f_p( g_q(x) )
is welldefined, ie. that the formula's apparent dependence
on the choice of p,q is only superficial.
The key to this is purely algebraic, namely showing that the
various functions f_p and g_q all commute, so that the order
in which they are composed is immaterial to the final result.
To begin with, the power functions f_p and f_q commute for
any two positive integer powers by the associative law of
multiplication and mathematical induction:
f_p( f_q(x) ) = (x^q)^p = x^(pq) = (x^p)^q = f_q( f_p(x) )
Stated another way, this gives an unfamilar cast to a law of
exponents:
f_p o f_q = f_pq.= f_qp = f_q o f_p
From this it can be deduced that the corresponding function
inverses also commute, because where inverses exist for two
functions, the inverse of their composition is the result of
composing their inverses in the opposite order. For example:
g_q( g_p( f_p( f_q(x) ) ) ) = x
when the adjacent inverses "cancel" one another, which says
that g_q( g_p(x) ) is the inverse of f_p( f_q(x) ).
It follows then that g_q and g_p must commute, because f_p
and f_q commute:
g_q o g_p = (f_p o f_q)^1 = (f_q o f_p)^1 = g_p o g_q
We further verify that f_p and g_q commute as well, so that
it really doesn't matter whether we define x^r by composing
them in one order or the other. Again we use the fact that
f_p and f_q commute (with respect to function composition):
f_q( f_p( g_q(x) ) ) = f_p( f_q( g_q(x) ) = f_p(x)
Therefore after applying g_q to both sides:
f_p( g_q(x) ) = g_q( f_p(x) )
which demonstrates f_p commutes with g_q.
One point that we've been a bit cavalier about is that g_q
is both a left and a right inverse for f_q. That is:
f_q( g_q(x) ) = x = g_q( f_q(x) )
Whenever a function maps its domain 11 and onto itself, an
inverse is twosided. This symmetric outcome may easily be
deduced from the symmetry of the relations:
y = f_q(x) <==> x = g_q(y)
where for any x there exists y to satisfy the condition, and
conversely for any y there exists such an x.
In any case these commutativity properties establish that x^r
is welldefined, because if we take r = (cp)/(cq), for c any
positive integer:
f_cp( g_cq(x) ) = f_p( f_c( g_c( g_q(x) ) ) ) = f_p( g_q(x) )
This assures us that in the particular case p is divisible by
q, so that r is an integer, our "new" definition of x^r fully
agrees with the old definition f_r based solely on arithmetic.
The more general point of these observations is that we aren't
mislead by using the exponential notation x^r with rational r
and rational s, because the familiar laws of exponents hold:
(x^r)^s = x^(rs)
(x^r)*(x^s) = x^(r+s)
(x^r)*(y^r) = (xy)^r
for any x,y > 0 and rational r,s. The proofs of all these are
purely algebraic, and for that reason I will not go into more
details.
However we will finish this section with a bit of calculus, a
derivation of the power rule for positive rational exponents,
then using the quotient rule to extend it to negative rational
exponents.
Recall we have really only dealt with r > 0 in defining:
x^r = f_p( g_q(x) )
when r = p/q and p,q > 0 are integers. The Chain Rule and the
Inverse Function Theorem then give us:
d(x^r)/dx = f'_p( g_q(x) ) * g'_q(x)
= p * (g_q(x))^(p1) * [1/f'_q( g_q(x) )]
= p * x^((p1)/q) * (1/q) * (1/x^((q1)/q))
= (p/q) * x^((pq)/q)
= (p/q) * x^((p/q)1)
= r * x^(r1)
That is, we've shown the Power Rule for rational r > 0.
Now we've already used in the computation above that dividing
by x^r is equivalent to multiplying by x^r, so it may be
worth pointing out that the commitment to treat negative
exponents as reciprocals is implied by the second law of
exponents cited above, with s = r:
(x^r)*(x^r) = x^0 = 1
Therefore on the calculus side of things we need only apply
the quotient rule to determine the derivative of x^r:
d(x^r)/dx = d(1/(x^r))/dx
d(x^r)/dx
=  
(x^r)^2
= r * x^(r1) * x^(2r)
= r * x^(r1)
With this we've also shown the Power Rule for rational r < 0.
This is exactly the same calculation as we gave before on the
integer exponents, but as acknowledged above, some algebraic
preliminaries were necessary to assure they are sensible for
the rational exponents.
At last we come to our final step, extending our exponents to
the general real case by taking limits of Cauchy sequences of
rational exponents. It should not be surprising that we can
show, once power functions x^a are "pinched" between monotone
power functions x^r both above and below, that x^a must also
be monotone, etc.
(to be continued)

Clarification of Answer by
mathtalkga
on
25 Jan 2005 20:21 PST
4. A "construction" of the real numbers in mathematics is
often based on Cauchy sequences of rational numbers. For
every real number there is a sequence of rational numbers
converging to it. For example, even though SQRT(2) or pi
is irrational, their decimal expansions give us (by way of
truncation) convergent sequences of rational numbers.
If the last section was topheavy with algebra, this one is
topheavy with analysis, ie. with fussing over limits and
how to estimate sizes of things.
So far we've defined power functions x^r for all rational
exponents r and determined that their derivatives obey the
Power Rule:
d(x^r)/dx = r * x^(r1)
To extend our definition to real, irrational exponents a, we
need to take the limit of x^r as r approaches a. In doing so
we will make free use of the exponent notation and the usual
"laws of exponents" for rational r, whose justification was
sketched in the previous section.
We state without proof the following:
* * * * * * * * * * * * * * * * * * *
Thm. (Completeness Property of the Real Numbers)
Let {r_i} be a Cauchy sequence of real numbers. That is,
for every epsilon > 0, there exists integer M > 0 such that
for all i,j > M, r_i  r_j < epsilon.
Then the sequence {r_i} converges to a real number.
* * * * * * * * * * * * * * * * * * *
The essential reason the real numbers have this property is
because we "bake in" that property with their construction.
The real numbers are the "completion" of the rationals with
respect to the usual notion of distance between to numbers,
the absolute value of the difference. So at any rate every
Cauchy sequence of rationals has a unique limit in the real
numbers, and the extension of this fact to Cauchy sequences
of real numbers is the "completeness property".
For our purposes we need to show that if {r_i} is a Cauchy
sequence of rational numbers, then for any fixed real x > 0,
{x^r_i} is a Cauchy sequence of real numbers.
It suffices to have an estimate of x^r_i  x^r_j in terms
of r_i  r_j. Intuitively, making the exponents close to
one another puts the corresponding powers of x close to one
another.
x^r_i  x^r_j = [x^(r_i  r_j)  1] * x^r_j
Prop. 1 Let r > s be rational numbers and x > 0 be real.
Then:
i) if x > 1, then x^r > x^s
ii) if x = 1, then x^r = x^s
iii) if x < 1, then x^r < x^s
Proof: This harkens back to something we showed earlier in
the Comment. Certainly for any positive integer n, x > 1 if
and only if x^n > 1. Restated conversely, x > 1 if and only
if x^(1/n) > 1.
First we show that if x > 1, then x^(rs) > 1. Since r > s,
we can express r  s with a common denominator as p/q where
p,q are each positive integers. Then as just recalled:
x > 1 ==> x^p > 1
==> (x^p)^(1/q) > 1
==> x^(rs) = x^(p/q) > 1
which suffices upon multiplying both sides by x^s to show:
x > 1 ==> x^r > x^s
This proves part (i) of the Proposition. Part (ii) is trivial.
Part (iii) follows from part (i) by applying it to 1/x, since
taking reciprocals of positive numbers reverses the direction
of an inequality.
QED
The result above establishes that for fixed x > 0, the values
x^r vary monotonically with r, a nice counterpart to our earlier
treatment of monotonicity in x for fixed r. One other result is
needed:
Prop. 2 Let x > 1 be a real number. Then
i) {x^n: n = 1,2,3,...} increases without limit.
ii) {x^(1/n): n = 1,2,3,...} converges to 1
Proof:
(i) Clearly x > 1 implies:
x < x^2 < x^3 < ...
so the sequence in part (i) is strictly increasing. Therefore
it either increases without limit (tend to +oo), or it must have
as a limit a least upper bound (a fact which can be rigorously
deduced from the Completeness Property of Real Numbers), say u.
Since x > 1, u/x < u and therefore some integer n is such that:
x^n > u/x
But then x^(n+1) > u, contradicting that u was an upper bound.
Thus the sequence increases without limit (tends to +oo).
(ii) It is a little less obvious, but true, that the sequence:
x > x^(1/2) > x^(1/3) > ...
is monotone decreasing. Let m < n be two positive integers,
so that assuming x > 1 still, x^m < x^n. Now the mn'th root
function is monotone increasing so applying it to both sides
of that inequality gives:
x^(1/n) = (x^m)^(1/mn) < (x^n)^(1/mn) = x^(1/m)
so x^(1/m) > x^(1/n) when m < n, as desired.
Furthermore since x > 1, we know x^(1/n) > 1^(1/n) = 1, and
thus 1 is a lower bound on the root sequence. It remains to
show that 1 is a greatest lower bound and therefore the limit
of the monotone decreasing sequence of roots.
Suppose instead that b > 1 is also a lower bound of x^(1/n)
for all positive integers n:
b < x^(1/n)
Now b^n < x for all integers n. In other words b > 1, but
then sequence {b^n} has finite upper bound x, which contradicts
part (i) of this proposition. So no such lower bound b > 1
exists, and the greatest lower bound of {x^(1/n)} is 1. As the
sequence is monotonic, once the sequence is within epsilon > 0
of 1, it remains "within epsilon" of 1, so 1 is the limit to
which the sequence converges.
QED
Before we reach for the climatic proof of the power rule for
real exponents, let's first warm up by arguing that the laws
of exponents continue to apply, and for that matter that a
real power of x is welldefined by taking a limit on rational
exponents:
Thm. (Laws of Exponents, Real Powers)
Let r > 0 be a real number, which is the limit of a sequence
of positive rational numbers {r_i}. Then for any x > 0:
f(x) = limit x^r_i
i > oo
exists and is the same for any positive rational sequence
{r_i} chosen. Moreover the laws of exponents hold for real
powers r,s and positive real bases x,y:
i) (x^r)^s = x^(rs)
ii) (x^r)*(x^s) = x^(r+s)
iii) (x^r)*(y^r) = (xy)^r
Proof: It suffices to show the limit f(x) exists, to show
that f(x) = x^r is welldefined, independent of the choice
of rational sequence converging to r. For if two positive
rational sequences both converge to r, we can combine them,
interlacing them as odd and even entries into one sequence
whose limit must then be common to both subseqences. In
particular if r is actually a rational number, our "new"
definition must secretly agree with the old one by virtue
of considering the constant sequence r_i = r.
We claim that {x^r_i} is a Cauchy sequence of real numbers,
which is sufficient by the Completeness Property to show
convergence. The logic is:
(1) Since {r_i} converges to r, {r_i} is a Cauchy sequence.
That is, given any epsilon > 0, there exists M such that
for all i,j > M, r_i  r_j is always less than epsilon.
(2) In Prop. 2 (ii) above we showed that for any x > 0, the
sequence {x^(1/n)} converges to 1. So for fixed x we can
specify N such that by the monotonicity shown in Prop. 1:
rational s in (0,1/N) ==> x^s  1 < epsilon
for any desired epsilon > 0.
(3) Putting both facts together, for fixed x > 0, there
exists for any epsilon > 0 an integer M such that for all
i,j > M we have r_i  r_j less than some 1/N which
guarantees:
x^r_i  x^r_j < x^r_i  r_j  1 * min(x^r_i,x^r_j)
< epsilon * C
where C is an upper bound on {x^r_i}, say x^R where R is
an upper bound on {r_i} if x > 1, or simply 1 if x <= 1.
Since epsilon can be as small as we please, this shows
the sequence {x^r_i} is Cauchy, and thus convergent.
Once we have the definition of x^r as a limit from the
rational exponent cases, the laws of exponents (i)(iii)
follow easily. Let us show the third of these in some
detail:
(x^r)*(y^r) = ( limit x^r_i ) * ( limit y^r_i )
i > oo i > oo
= limit (x^r_i)(y^r_i)
i > oo
= limit (xy)^r_i
i > oo
= (xy)^r
where we've used only that a product of two limits which
exists is the limit of corresponding products, together
with the previously established law of exponents for the
rational case. Parts (i) and (ii) are similar.
QED
Thm. (Power Rule for Positive Real Exponents)
Let r > 0 be a real number. Then f(x) = x^r is a continuous,
monotone increasing function from positive real numbers to
positive real numbers with inverse g(x) = x^(1/r). Also f is
differentiable, and:
f'(x) = r * x^(r1)
Proof: Having developed all the "machinery" above, it is now
straightforward to prove the power rule continues to hold for
positive real exponents. Of course if r is rational, we are
already done. So let's assume r is irrational.
One way to show f(x) = x^r is continuous and increasing is to
jump right into show that it is differentiable with positive
derivative. For example the laws of exponents allow us to
reduce the question of the derivative of f'(x) for general x
to that of the derivative at x = 1:
(x + h)^r  x^r
f'(x) = limit 
h > 0 h
(1 + h/x)^r  1
= limit  * x^(r1)
h > 0 h/x
= f'(1) * x^(r1)
This simplification isn't essential, as the way we are about to
show f'(1) = r would really work for any argument x, but it will
make the notation and (hopefully) the presentation clearer.
Recalling the monotonicity properties of Prop. 1, it should be
evident that for rational sequences {r_i} converging to r from
above and {s_i} converging to r from below, we have:
for all i, x > 1 implies x^s_i < x^r < x^r_i
x = 1 implies x^s_i = x^r = x^r_i
x < 1 implies x^s_i > x^r > x^r_i
In other words the graph of f(x) = x^r is "pinched" between the
family of curves x^s_i and x^r_i. Since their curves are strictly
monotone increasing, the curve x^r must also be increasing at 1.
In particular since for h > 0:
(1 + h)^s_i < (1 + h)^r < (1 + h)^r_i
(1  h)^s_i > (1  h)^r > (1  h)^r_i
we can "squeeze" the limits of the difference quotients:
(1+h)^r  1
f'(1) = limit  = r
h > 0 h
because both "side" limits as i > oo agree:
(1+h)^r_i  1
limit ( limit  ) = limit r_i = r
i > oo h > 0 h i > oo
(1+h)^s_i  1
limit ( limit  ) = limit s_i = r
i > oo h > 0 h i > oo
Therefore in general f'(x) = f'(1) * x^(r1) = r * x^(r1).
The demonstration that g(x) = x^(1/r) is the inverse function to
f(x) = x^r is an even more immediate application of the laws of
exponents, namely part (i) of the preceding Theorem:
g(f(x)) = (x^r)^(1/r) = x^(r * 1/r) = x^1 = x
QED
We finish up by filling in the gap for negative exponents.
Corollary (Power Rule for All Real Exponents)
If we extend the definition f(x) = x^r to r < 0 by allowing
the limit of a general sequence of rational numbers r_i > r,
then the power rule and other properties continue to hold, the
only difference worth mentioning is that when r < 0, f(x) is
monotone decreasing.
Proof: Since the limit of reciprocals of a sequence converging
to a nonzero limit is the reciprocal of that limit, the result
of defining:
f(x) = limit x^r_i
i > oo
for a sequence of negative rational numbers converging to r < 0
is the same as:
limit x^r_i = limit x^r_i
i > oo i > oo
= 1 / limit x^r_i
i > oo
= 1 / x^r
so as before we can take the derivative of f(x) by applying the
simplified quotient rule:
f'(x) = r * x^(r1) / x^2r
= r * x^(r  2r  1)
= r * x^(r  1)
= r * x^(r1)
QED
