I have been running some experiments, which involves recording the
amount of time a process takes to finish. I then run another more
complex experiment of the problem and record how long that takes to
complete, and so on. I have collected a set of results and graphed
these times and it has given me a curve that appears to indicate a
linear increase in the time taken for each experiment (the line is
more or less straight).
My question is what statistical analysis i could perform to confirm
that the results are increasing linearly and will likely continue to
if i was to run more experiments. Essentially i want to mathematically
prove that the problem is not becoming harder to solve (taking more
time) as it becomes more complex.
If what i have written is confusing please ask for clarification.
Thanks. |
Request for Question Clarification by
pafalafa-ga
on
28 Apr 2003 07:50 PDT
There are certainly tests to establish the degree of linearity for the
data points you have. But if your question is: "how can I
statistically show that the linear relationship will continue
indefinitely" the answer, I think, is "You can't".
The linear relation may only be a local phenomenon (say, a linear
segment of a larger S-shaped distribution), and I'm not aware of a
statistical method that would distinguish one from the other. Seems
to me you need to turn to the first principles underlying your
experiments, and from them, argue from theory that a linear or
non-linear result would be expected.
|
Request for Question Clarification by
mathtalk-ga
on
28 Apr 2003 07:50 PDT
Hi, atease-ga:
I think you may actually have two distinct goals in mind.
Some "statistical" methods could be applied to the running time
measurements, to see how well a linear fit to these data points works.
This would be empirical evidence, not proof.
A "mathematical proof" of linear time complexity, on the other hand,
would involve algorithmic analysis.
It is sometimes the case that an implementation of an algorithm
appears to have nearly linear time complexity, but such good behavior
abruptly ceases as memory constraints are exceeded. Without the
source code (or at least pseudo-code) for your program, it is
impossible to make any judgement.
If you are intent on an empirical approach to this question, then I
suggest you fit a line to your existing data points (perhaps excluding
the smallest few runs to minimize the influence of "start-up" costs).
Then run the program for a couple of inputs an order of magnitude
larger than what you've done so far. If those running times are close
to the "predicted" values of your existing fit, then in a scientific
sense you've provided good evidence for the hypothesis of linearity.
If you are interested in a mathematical analysis of the time
complexity, I can point you to some good references on the subject
and/or you can perhaps explain a bit more about what the underlying
algorithm is doing.
regards, mathtalk-ga
|
Clarification of Question by
atease-ga
on
28 Apr 2003 09:42 PDT
mathtalk-ga, I can see the difference that you point out between
empirical evidence and mathematical proof based on the algorithm, and
i guess for a good proof evidence from both is ideally required. What
exactly i'm doing: I am solving mixed-integer optimization problems
using CPLEX to find an optimal solution. I am assessing how well CPLEX
performs on several different types of problem at varying levels of
complexity. As i said, the results are indicating a linear increase
for nearly all problem types. The reason i would like to confirm this
is that i am testing other heuristic methods (hill climbing, simulated
annealing, tabu search) on the same problems to see if they perform
(relatively) better (in absolute terms they will not be able to get
any near to CPLEX's performance) than CPLEX does. I hope this has
helped. You mentioned that you could point me to references regarding
mathematical analysis of the time complexity, if you choose to answer
the question i would be grateful if you could include those.
|