Hi, cwdga:
In 1965 Cooley and Tukey published an algorithm for computing the
discrete Fourier transform on 2^n nodes using O(n log n) operations:
[Fast Fourier Transform  Cooley and Tukey]
http://en.wikipedia.org/wiki/Fast_Fourier_transform
It was later found that the idea had been used by C.F. Gauss in 1805
and independently rediscovered by several others in the intervening
160 years. Nonetheless the paper by Cooley and Tukey gave rise to
considerable interest in using these kinds of numerical computations
to increase algorithmic efficiency.
It should be mentioned that the fast Fourier transform, even when done
with real valued input, will inherently involving complex arithmetic
in intermediate steps.
See the article above for descriptions of and links to other fast
Fourier transform algorithms, applicable in varying degree to numbers
of nodes other than a power of 2. A nice explanation with specifics
was presented by my colleague rbnnga in thie earlier Answer:
[Q: Fast Fourier Transform]
http://answers.google.com/answers/threadview?id=92858
for how a computation for general values of n might be consumated
using a power of 2 computation.
In any case for the sake of illustration of the utility of fast
Fourier transforms, I'll choose an application in which the number of
nodes can be chosen with some flexibility, multiplying two extended
precision integers.
The key idea here is that in passing from the original space of array
values to their discrete Fourier transform values, we have a
relationship between the convolution of two arrays on one side and
their pointwise multiplication on the other.
Now a convolution of two arrays is just what happens when two
polynomials are multiplied modulo X^N  1, where N is the number of
nodes. Polynomial arithmetic modulo X^N 1 causes the coefficients to
"wrap around" after power X^(N1) and "go back" to constants, etc.
So let's assume we have two extended precision integers U and V, and
let's regard these integers as polynomials:
u(X) = SUM u_i X^i
v(X) = SUM v_i X^i
which are to be evaluated at X = radix (for whatever number base the
integers are written in, e.g. binary if desired).
Then the product of these polynomials will have coefficients which are
a convolution of their coefficients:
w(X) = SUM (SUM u_k * v_{ik} FOR k=0,..,i) X^i
or to write this in another way:
w_i = SUM u_k * v_{ik} FOR k=0,..,i
Now if N is chosen to be sufficiently large, say N more than twice the
maximum degree of u(X) or v(X), then there is no practical distinction
between their product w(X) and their product modulo X^N  1 (at least
so far as the coefficients are concerned). So let N be a sufficiently
large power of two, and let:
F(u) = discrete Fourier transform of (u_i: i = 0,..,N1)
F(v) = discrete Fourier transform of (v_i: i = 0,..,N1)
in which the higher order coefficients are padded with zeros.
Now F(u), F(v) are themselves arrays of length N, albeit with complex
entries in the general case. And the relationship between convolution
and pointwise multiplication gives this relation:
F(w) = F(u)*F(v)
where here we mean by "*" the entrybyentry product of two arrays.
Thus to recover the polynomial w(X) it's needed at this point to
"undo" the discrete Fourier transform. But the inverse of this
operation is closely related to itself:
F(F(w)) = N w
where here we mean on the righthand side that each entry in array w
has been scaled by a factor of N. [In fact the CooleyTukey fast
Fourier transform involves n steps where N = 2^n, and on the way
"back" if exact arithmetic were required, we could extract a factor of
2 at each such step, so that the factor N disappears along the way.]
In other words undoing the discrete Fourier transform is really not
more difficult than doing it in the first place.
Having obtained the polynomial product w(X), the actual extended
precision integer product W that obtains by evaluating X = radix, will
involve "carries" at potentially all of the coefficients. However
this additional computation, involving only additions and radix
shifts, will for practical purposes be O(N) complexity.
For further details of this see here:
[FFT based multiplication of large numbers]
http://numbers.computation.free.fr/Constants/Algorithms/fft.html
Note that the "naive" (pencil and paper) multiplication of u(X) and
v(X) requires something like the product of their degrees number of
operations, for each coefficient ("digit") of one must be multiplied
by each coefficient ("digit") in the other.
With the FFT approach we have only N multiplications to perform (on
the discrete Fourier transforms, which are complex values to be sure),
and N is no more than twice the size of the maximum of these degrees
(so within a constant of being linear in their degrees).
This would be useless, though, without the FFT to save our effort.
With that computation included the work becomes O(N log N). [Recall
that the FFT had n steps where N = 2^n.] This is a big savings
compared to the O(N^2) complexity of the naive algorithm.
Many extended precision numerical packages will include an FFT
implementation as one of several options. See here for description of
one such package:
[A Tour of NTL: NTL Implementation and Portability]
http://www.wschnei.de/numbertheorysoftware/ntl/tourimpl.html
To summarize the fast Fourier transform allows us to multiply two N/2
"digit" integers, obtaining an exact N "digit" product, with O(N log
N) operations instead of the O(N^2) operations needed by the
pencilandpaper approach we learn in grade school.
Please let know by way of a Request for Clarification if additional
information would be helpful.
regards, mathtalkga 
Clarification of Answer by
mathtalkga
on
27 Jun 2004 23:15 PDT
The above is an example of a "direct" method as opposed to an
"iterative" one, so it's a bit easier to state the savings in
complexity (since an exact answer is obtained in some finite number of
operations). However I'd like to give some illustrations of FFT in
iterative methods as well.
First, however, let me work through a simple exercise to illustrate
the ideas above. This will also give me an opportunity to clear up a
couple of points of confusion in my presentation.
I said that F(F(w)) = N w. Actually one needs to make a complex
conjugation in how the second "Fourier transform" is done to get this
result.
To be clear, let's do a multiplication of two 2digit numbers. For
simplicity I've decided to square the number 14, which will also
illustrate how "carries" come into the picture at the end.
So we think of the (decimal number) 14 as being a polynomial, and
we'll pad it out to four coefficients (because a 2digit number times
a 2digit number will give at most a 4digit product):
0X^3 + 0X^2 + 1X + 4
or simply, as the array of coefficients:
u = [4, 1, 0, 0]
putting the least significant term first in the array.
The discrete Fourier transform is a linear transformation, which means
it can be represented by a matrix multiplication (with possibly
complex values). In the case of 4 nodes, the matrix is:
[FFT  Some Simple Transforms]
http://www.eptools.com/tn/T0001/PT01.HTM#Head687
+1 +1 +1 +1
+1 i 1 +i
+1 1 +1 1
+1 +i 1 i
If we multiply this matrix times its complex conjugate, we get 4I (verify!).
Now multiply our vector u = [4, 1, 0, 0] by it:
F(u) = [5, 4i, 3, 4+i]
Okay, so since we wanted to square 14, here we just multiply each
entry of F(u) by itself:
F(w) = [25, 158i, 9, 15+8i]
Now multiply F(w) by the complex conjugate of our matrix above:
4w = [64, 25 + i(158i)  9  i(15+8i), 4, 25  i(158i)  9 + i(15+8i)]
= [64, 32, 4, 0]
w = [16, 8, 1, 0]
Okay, so the polynomial w(X) is:
0X^3 + 1X^2 + 8X + 16
When we "evaluate" X = 10, then (proceeding from lowest order terms to
highest), we remove any "excess" greater than or equal to 10 and
promote it to the next higher coefficient. Thus we find:
14 squared = 1 6
8
1

1 9 6 = 196 (base ten)
That's how the carries work out, and in general it suffices to make a
single sweep from lowest order to highest order (hence the estimate of
O(N) work).
Now this illustrates the discrete Fourier transform. What makes the
method "fast". That would be a matter of cleverly arranging the
arithmetic (in this particular case) into two steps (because 4 = 2^2),
essentially decomposing the 4 node case into two 2 node cases.
The cleverness involves some shuffling of entries, some adding and
subtracting, and some multiplying by "roots of unity". The case at
issue here would be computed something like this:
0 0 1 4 (original coefficients)
0 1 0 4 (shuffling based on bit reversals of indexes)
1 1 4 4 (sum and difference of adjacent pairs)
i 1 4 4 (multiplying by roots of unity)
4+i 3 4i 5 (sum and difference of doubly spaced pairs)
Next I'd like to discuss the appearance of the fast Fourier transform
in the numerical solution of partial differential equations, an area
in which iterative methods will find their place.
regards, mathtalkga
