Google Answers Logo
View Question
 
Q: monotonically increasing non-linear functions ( Answered 4 out of 5 stars,   0 Comments )
Question  
Subject: monotonically increasing non-linear functions
Category: Science > Math
Asked by: crowlogic-ga
List Price: $30.00
Posted: 19 May 2005 14:27 PDT
Expires: 18 Jun 2005 14:27 PDT
Question ID: 523457
I need a list of monotonically increasing non-linear functions which
can be fit to a 2d dataset using non-linear least squares or some
other similiar method. I am not looking for an after-the-fact test,
the formula needs to guarantee this mathematically.

If this is not possible, an explanation of why and how and possible
alternatives would be great!

Clarification of Question by crowlogic-ga on 19 May 2005 14:37 PDT
To further clarify, this function is used to approximate a limit order
book price impact model. I will be fitting these curves and
prediciting into the future and I need to gaurantee that my predicted
curves make sense.

Clarification of Question by crowlogic-ga on 19 May 2005 14:44 PDT
One more clarification. The monotonicity of the function only has to
be gauranteed within a predefined interval. I know polynomials have a
tendency to go wild when evaluated outside of the range of the data to
which they were fit.

Request for Question Clarification by mathtalk-ga on 20 May 2005 05:48 PDT
Hi, crowlogic-ga:

It sounds like an interesting problem, but let me ask for some Clarification.

By 2D dataset, do you mean a set of (x,y) values where y is presumably
functionally dependent on x?  Or do you mean (x,y,z) values where z is
supposed to depend functionally on both x and y?  The meaning of
monotonically increasing is clear in one-dimension (the former case);
perhaps we should be more precise about the requirement if you have
the second case in mind.

As a general observation, yes, polynomials tend to "go wild" when
interpolating data, so polynomial interpolation of monotone data will
frequently produce a non-monotone function.  Spline interpolation can
be made yield a monotone result, and I can point you to references on
this.

However you seem to be interested not in interpolation but
extrapolation.  Of course there is no guarantee that a good fit curve
to past data will give a good fit in the future.  There are models
based on stochastic processes.

If you have one of these in mind, then naturally you do not expect or
want an exact fit; you hope to extract an underlying trend (about
which observations will have an expected range of variation).  In any
case the use of least squares or similar methods, as you've suggested,
is probably the right idea.

A least squares fit of polynomials doesn't tend to have the same
wildness as the interpolation method.  In fact you could do a
constrained least squares fit to guarantee monotonicity of a
polynomial least squares approximation over a prescribed (future)
range.  If this would be of interest, and the Clarification I
requested above makes sense, please Reply at your convenience.


regards, mathtalk-ga

Clarification of Question by crowlogic-ga on 20 May 2005 11:07 PDT
Sorry for the confusion, by 2d I do mean x and y values.

I am not interested in either extrapolation or interpolation. The
polynomial models I am fitting now are quite accurate, r^2 value of
0.98+.

The problem is basically this. Take a limit order book

PRICE  SHARES
38.00  100
38.01  200
38.02  300
38.03  400

Now consider the following 'price impacts', if I buy X number of
shares, what is the average price I'll pay?

BOUGHT PRICE PAID
100    38.00
300    38.00666667
600    38.01333333
1000   38.02
   
Using this method, I walk the entire book and come up with a very nice
monotone curve. When I fit a polynomial to the curve it fits very
closely from a MSE perspective, but in some cases there is slight
deviation from monotonacity.

The information about (complete set of volumes and prices) is much to
large and unwiedly to do anything useful with, so I fit a curve to the
data and then analyze the coeffecients of the curve and do neat things
like classification, prediction, etc.

The problem with spline fits is that the coeffecients could be a bit
more difficult to interpret, but I would be interested in giving it a
shot.

However, the constrained least squares fit sounds like exactly what I
am looking for! Currently I am using a Vandermonde matrix and simple
matrix solving to come up with the coeffecients.

Request for Question Clarification by hedgie-ga on 21 May 2005 13:38 PDT
Mathtalk
         It does not make sense to fit these points to polynomials.
 I mean IMHO.
The curve represents price elasticity : decreasing demand as price is
increasing. It seems natural to fit this to a sum of exponentials.
That way, as your price goes to infinity, your demand goes to zero,
which is the right limiting behviour. I would even gues that one such
exponential may be enough....

Clarification of Question by crowlogic-ga on 21 May 2005 18:35 PDT
I've found a paper which is what I need.

http://arxiv.org/abs/physics/0305019

Abstract:
A finite sum of exponential functions may be expressed by a linear
combination of powers of the independent variable and by successive
integrals of the sum. This is proved for the general case and the
connection between the parameters in the sum and the coefficients in
the linear combination is highlighted. The fitting of exponential
functions to a given data- set is therefore reduced to a multilinear
approximation procedure. The results of this approximation do not only
provide the necessary information to compute the factors in the
exponents and the weights of the exponential terms but also they are
used to estimate the errors in the factors.

I am not a mathematician though, and it is not immediately obvious how
to turn this into code.

Request for Question Clarification by hedgie-ga on 22 May 2005 00:01 PDT
Hi Crowlogic

     I am pleased that you like idea using sum of exponentials,

 A) y= sum(over i) [A.i*exp(-B.i *x)

 rather then general polynomials. (It looks like you do, ??  :-).

 Actually, sum of inverse powers

B) y= sum(over i) A.i * (x^ -B.i) 

would do as well, but I consider exps more natural in this application.

In both cases all B.i  are >0  and can be fixed apriory 
               (so you can have e.g.  A.1/X + A.2/(x*x) ...
or B.i can be optimized as well.

So, you seek either 1)  A.i only for preselected set of B.i and (X,Y) set
       or           2)  best set of A.i and B.i for set of points (x,y)
  
 Both cases are well known and algorithms and programs do  exist -
 but it may take hours to find them for a given lang/OS.
 
Anyway -  question I have : Do you want me to post this as an answer?
That means just choice of the base function - an answer to original question.
(I would add few links and a discussion of  merits of all 4 cases).

 Whhat would remain, after you pick a case, would be a software question
 (application programming question) rather then math.

 That would have to be posted as another (perhaps higher priced) question
 and would require info on OS,size of data sets,  wheather you are to able to
 compile c or FORTRAN etc and stuff like that. 

What is your choice?

Hedgie


post higher priced question about how to solve case one of the case
 (A or B) and 1) or 2) 
 




)

Clarification of Question by crowlogic-ga on 22 May 2005 00:57 PDT
Thanks hedgie. 

I'd prefer the function to be of the form

y(x) = sum(i..N) a[i]*e^(-b[i]*x)

as referenced in the paper above.

I need to estimate both a and b as I am not entirely sure which will
best fit the data.

Also, I do not want an iterative method, I'd like to be able to solve
the problem in a similiar method as polynomial regression, e.g.,
construct Vandermonde matrix then solve for the coeffecients.

Aftet your response I'll post another question for the implementation.
Answer  
Subject: Re: monotonically increasing non-linear functions
Answered By: hedgie-ga on 22 May 2005 11:59 PDT
Rated:4 out of 5 stars
 
crowlogic-ga
 
  So, form is

y(x) = sum(i..N) a[i]*e^(-b[i]*x)


and task is to  estimate both a and b,  as we are not sure which 
 b[i] set will best fit the data. 


 I assume what you said in the RFC WAS an invitation to post this as an answer.
 
 I promised to add something, but since you already picked a case,
 and I agree with your choice, what is there to add?

I looked at the paper you found. It is clever - may be even too clever.
It also looks like quite a recent result. I doubt that one will find a
ready made program implementing that, free on the net. It looks
complex to program and may be an overkill.

My recommendation would be to use  good old 'peeling method', for
initial guess and follow that with the usual iterations.
At least as first try. You may find out that you need very few terms.
Actually that - how many terms are 'significant' seems to me to be the
very core issue of this interesting project.

In peeling method you plot

log Y vs x   

and see that one end 'fits'  a line. You disregard the other end for now.

That gives you A and B of the dominant term in the sum.

You take that term away and proceed the same way until only 'noise' is left.
 
This method is described e.g. here under name
 DIMSUM, an acronym for DIMension of a SUM of exponentials
http://biocyb.cs.ucla.edu/dimsum/abstract.html

SEARCH TERM was: peel sum of exponentials

I would (for your application in particular) cobine that with Markov Chain,
as done here:
http://www.blackwell-synergy.com/links/doi/10.1111/j.1467-985X.2004.00335.x/abs/?cookieSet=1

He decided just two terms are relevant. I wonder what the clever
approach would do with his data.

Hedgie

Request for Answer Clarification by crowlogic-ga on 22 May 2005 12:41 PDT
Actually, an iterative procedure is not usable for me. I will be
fitting many thousand curves per second and an iterative procedure is
much slower and not gauranteed to converge to a solution.

Clarification of Answer by hedgie-ga on 22 May 2005 22:31 PDT
Crow logic,

     As I am re-reading Kaufman's paper: 


1) You formulate eq (4) as linear combinations of I[k](x) and x^k 
    You need some initial estimate of b[i] to evaluate I[k],  don't you?

2) You determine c[i] and d[i] defined from eq (4) -
      by simple inversion of 2N x 2N matrix

 After the Eq (7) it says: the B[i] are roots of polynomial (8) with
coefficients c[i]

3) You solve for B[i] by ITERATIVE method, using standard library.
       ... and iterate to improve initial b[i] guess?

   His method  looks computationally intensive and not guaranteed to
converge if matrix in step 2) is ill-conditioned.

  I am not convinced that it will be faster than the Peeling Algorithm,
  when fully automatized.

    I would use a supercomputer, try both methods and compare.

Main comment concerning the original questions is this:

 In step 2) you are fitting your data to a base 
             consisting of I[k] and x^k  terms.
 
For the record I want to state :-),
 that I recommended base consisting of exponentials only
 also, to at least try the peeling algorithm.
 However, issue of algorithm and implementation  goes beyond the
original question, so I will stop here.
 
Thank you for an interesting problem and good luck.

Hedgie
crowlogic-ga rated this answer:4 out of 5 stars and gave an additional tip of: $5.00
Good suggestions. I'll take them into consideration. Thanks for the help!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy