This is a common error in curvefitting. With 5 points you can get a
4th order polynomial to thread right through the points you have and
shoot off wildly when you interpolate or extrapolate. I'll dig up some
references for you soon. I did a lot of that kind of work a few years
ago and it happens every time. You need more data, a lower degree (2nd
or 3rd), and a good understanding of what is happening. Try plotting
out the equation you got to see what I mean. You can "weight" your
results by including some of the same points more than once to gett
better accuracy in that area. I havn't used Excel trendline yet but
whatever the software, the mathematical result will be the same. The
more distinct actual points you have, the "smarter" your estimate will
become but don't expect it to be right on top of your input data
points. Also, it is best to use a polynomial that is reasonable for
the real world case, e.g. if the trend can only go up or down, stick
to a low degree (<3). |
Clarification of Answer by
chris2002micrometer-ga
on
05 Oct 2004 11:41 PDT
searching for: polynomial curvefit
turns up this very direct tutorial:
http://www.graphpad.com/curvefit/id210.htm
This seems better to me than many others that turned up because it
explains the concepts in plain language for the otherwise competent
non-expert. The whole concept of this analysis (also called least
squares regression) is to arrive at a minimum total error. A simple
average is the same as a 0-degree polynomial. You understand that an
average number of jellybeans in a bag is very likely to be fractional
and simply suggests that the whole number above or below is what is
actually possible.
|
Clarification of Answer by
chris2002micrometer-ga
on
05 Oct 2004 12:00 PDT
http://math.fullerton.edu/mathews/n2003/LeastSqPolyMod.html
This link gets into more mathematical detail but it mentions "...wrong
if the data is radically "NOT polynomial." As an example if you take
data that is pretty much linear and fit a high degree, the calculation
and solution can run into precision problems, dividing by 0, etc.
Also, an "outlier", a point that clearly does not fit in with the
rest, can give a crazy result. If you do have a good set of data and a
good polynomial for it, the estimate is not "guaranteed" if you
extrapolate far away from your data. If you set a course of travel
sighting on two trees 20 feet apart and go two miles, you have to
expect to be off course somewhat.
|
Request for Answer Clarification by
benitoaz-ga
on
05 Oct 2004 13:36 PDT
Thanks Chris.
I still cannot get the equation to work.
If I plot the following number
Month value %
0 0
1
2
..
6 0.1%
..
12 10%
..
27 80%
28
29
30 95%
..
40 100%
it gives with a perfect fit the following equation
y = 1E-07x5 - 1E-05x4 + 0.0004x3 - 0.0029x2 + 0.0062x
if I use this equation to predict month 35 for example it gives a wrong answer..
this is really what I try to find out.. even if the result was few %
off this will be find but it gives me a value of 600%!!!
|
Clarification of Answer by
chris2002micrometer-ga
on
05 Oct 2004 16:28 PDT
Ben - That is going to be a difficult curve. Can you give me the
"correct" answer for month 35, or any others? Did you try including it
in with the input data? Can you tell me what is being modeled? I will
experiment with your data and get back.
|
Clarification of Answer by
chris2002micrometer-ga
on
05 Oct 2004 17:48 PDT
Ben - I plotted your data and your coefficients. The data has a bad
point (outlier) at month 27 and the rest could fit a poly that
concaves down after month 40. Your coefficients show a poly (almost
2nd order) concaving up from month 0. There is no correlation!
I tried to use some Excel stat functions like LINEST a while ago and
found there were documented problems with them (at MS site). As I
explained earlier, the data does need to suggest a real function to be
successfully fitted. If it is random, or wild, a poly can be made to
fit the points but any inference beyond them will be useless.
There are better packages available but they require more expertise
and wont make the assumptions the Excel wizard does.
|
Request for Answer Clarification by
benitoaz-ga
on
06 Oct 2004 06:08 PDT
At 35 it should be around 95 to 97%.
Again my plot has only the 5 points above. I used X,Y scattered graph.
I used then trendline polygon 5 to get the curve drawn and it gived an
R factor of 1...the curve is very smooth
At least between those points 0 and 40, it should give back a number
between 0 and 100%...
appreciate your input
Ben
|
Clarification of Answer by
chris2002micrometer-ga
on
06 Oct 2004 07:14 PDT
Ben - Take a lot at my plots.
http://www.micrometer.com/virtoff/curvefit2.xls
I think we have a square peg for a round hole. The coefficients you
have don't seem to relate to the original, and augmented, data points.
Can you post your worksheet for me to tinker with?
|
Request for Answer Clarification by
benitoaz-ga
on
06 Oct 2004 08:26 PDT
chris,
I will. how and where can I post my spreadsheet.
I know why you do not get the same curve
you need to create input X from 1 to 40 with increment of 1
1 0%
2
3
4
5
6 10%
7
8
9
10
11
12 50%
I just did it here for the first 12 month but expand to 40.
Do not put a value when you do not have one (month 2 to 5 for example)
Try that and let me know..
Ben
|
Clarification of Answer by
chris2002micrometer-ga
on
06 Oct 2004 09:26 PDT
Ben - The second plot was from your coefficients with x going 1 to 40.
Do you have a "home page" area where you can upload your s/s? You can
creat a free one at yahoo, msn, etc. Also your ISP probably has an
area you can use.
|
Clarification of Answer by
chris2002micrometer-ga
on
26 Oct 2004 06:43 PDT
Ben - I ran across a good free website that can help you study your data.
I ran a 3rd order poly with your data and can see why you may be having probs.
http://zunzun.com/
0 0
6 0.1
12 10
27 80
30 95
40 100
|