Google Answers: monotonically increasing non-linear functions

View Question

Q: monotonically increasing non-linear functions ( Answered 4 out of 5 stars

, 0 Comments )

Question

Subject: monotonically increasing non-linear functions
Category: Science > Math
Asked by: crowlogic-ga
List Price: $30.00

Posted: 19 May 2005 14:27 PDT
Expires: 18 Jun 2005 14:27 PDT
Question ID: 523457

I need a list of monotonically increasing non-linear functions which can be fit to a 2d dataset using non-linear least squares or some other similiar method. I am not looking for an after-the-fact test, the formula needs to guarantee this mathematically. If this is not possible, an explanation of why and how and possible alternatives would be great!
Clarification of Question by crowlogic-ga on 19 May 2005 14:37 PDT To further clarify, this function is used to approximate a limit order book price impact model. I will be fitting these curves and prediciting into the future and I need to gaurantee that my predicted curves make sense.
Clarification of Question by crowlogic-ga on 19 May 2005 14:44 PDT One more clarification. The monotonicity of the function only has to be gauranteed within a predefined interval. I know polynomials have a tendency to go wild when evaluated outside of the range of the data to which they were fit.
Request for Question Clarification by mathtalk-ga on 20 May 2005 05:48 PDT Hi, crowlogic-ga: It sounds like an interesting problem, but let me ask for some Clarification. By 2D dataset, do you mean a set of (x,y) values where y is presumably functionally dependent on x? Or do you mean (x,y,z) values where z is supposed to depend functionally on both x and y? The meaning of monotonically increasing is clear in one-dimension (the former case); perhaps we should be more precise about the requirement if you have the second case in mind. As a general observation, yes, polynomials tend to "go wild" when interpolating data, so polynomial interpolation of monotone data will frequently produce a non-monotone function. Spline interpolation can be made yield a monotone result, and I can point you to references on this. However you seem to be interested not in interpolation but extrapolation. Of course there is no guarantee that a good fit curve to past data will give a good fit in the future. There are models based on stochastic processes. If you have one of these in mind, then naturally you do not expect or want an exact fit; you hope to extract an underlying trend (about which observations will have an expected range of variation). In any case the use of least squares or similar methods, as you've suggested, is probably the right idea. A least squares fit of polynomials doesn't tend to have the same wildness as the interpolation method. In fact you could do a constrained least squares fit to guarantee monotonicity of a polynomial least squares approximation over a prescribed (future) range. If this would be of interest, and the Clarification I requested above makes sense, please Reply at your convenience. regards, mathtalk-ga
Clarification of Question by crowlogic-ga on 20 May 2005 11:07 PDT Sorry for the confusion, by 2d I do mean x and y values. I am not interested in either extrapolation or interpolation. The polynomial models I am fitting now are quite accurate, r^2 value of 0.98+. The problem is basically this. Take a limit order book PRICE SHARES 38.00 100 38.01 200 38.02 300 38.03 400 Now consider the following 'price impacts', if I buy X number of shares, what is the average price I'll pay? BOUGHT PRICE PAID 100 38.00 300 38.00666667 600 38.01333333 1000 38.02 Using this method, I walk the entire book and come up with a very nice monotone curve. When I fit a polynomial to the curve it fits very closely from a MSE perspective, but in some cases there is slight deviation from monotonacity. The information about (complete set of volumes and prices) is much to large and unwiedly to do anything useful with, so I fit a curve to the data and then analyze the coeffecients of the curve and do neat things like classification, prediction, etc. The problem with spline fits is that the coeffecients could be a bit more difficult to interpret, but I would be interested in giving it a shot. However, the constrained least squares fit sounds like exactly what I am looking for! Currently I am using a Vandermonde matrix and simple matrix solving to come up with the coeffecients.
Request for Question Clarification by hedgie-ga on 21 May 2005 13:38 PDT Mathtalk It does not make sense to fit these points to polynomials. I mean IMHO. The curve represents price elasticity : decreasing demand as price is increasing. It seems natural to fit this to a sum of exponentials. That way, as your price goes to infinity, your demand goes to zero, which is the right limiting behviour. I would even gues that one such exponential may be enough....
Clarification of Question by crowlogic-ga on 21 May 2005 18:35 PDT I've found a paper which is what I need. http://arxiv.org/abs/physics/0305019 Abstract: A finite sum of exponential functions may be expressed by a linear combination of powers of the independent variable and by successive integrals of the sum. This is proved for the general case and the connection between the parameters in the sum and the coefficients in the linear combination is highlighted. The fitting of exponential functions to a given data- set is therefore reduced to a multilinear approximation procedure. The results of this approximation do not only provide the necessary information to compute the factors in the exponents and the weights of the exponential terms but also they are used to estimate the errors in the factors. I am not a mathematician though, and it is not immediately obvious how to turn this into code.
Request for Question Clarification by hedgie-ga on 22 May 2005 00:01 PDT Hi Crowlogic I am pleased that you like idea using sum of exponentials, A) y= sum(over i) [A.iexp(-B.i x) rather then general polynomials. (It looks like you do, ?? :-). Actually, sum of inverse powers B) y= sum(over i) A.i * (x^ -B.i) would do as well, but I consider exps more natural in this application. In both cases all B.i are >0 and can be fixed apriory (so you can have e.g. A.1/X + A.2/(x*x) ... or B.i can be optimized as well. So, you seek either 1) A.i only for preselected set of B.i and (X,Y) set or 2) best set of A.i and B.i for set of points (x,y) Both cases are well known and algorithms and programs do exist - but it may take hours to find them for a given lang/OS. Anyway - question I have : Do you want me to post this as an answer? That means just choice of the base function - an answer to original question. (I would add few links and a discussion of merits of all 4 cases). Whhat would remain, after you pick a case, would be a software question (application programming question) rather then math. That would have to be posted as another (perhaps higher priced) question and would require info on OS,size of data sets, wheather you are to able to compile c or FORTRAN etc and stuff like that. What is your choice? Hedgie post higher priced question about how to solve case one of the case (A or B) and 1) or 2) )
Clarification of Question by crowlogic-ga on 22 May 2005 00:57 PDT Thanks hedgie. I'd prefer the function to be of the form y(x) = sum(i..N) a[i]e^(-b[i]x) as referenced in the paper above. I need to estimate both a and b as I am not entirely sure which will best fit the data. Also, I do not want an iterative method, I'd like to be able to solve the problem in a similiar method as polynomial regression, e.g., construct Vandermonde matrix then solve for the coeffecients. Aftet your response I'll post another question for the implementation.

Answer

Subject: Re: monotonically increasing non-linear functions
Answered By: hedgie-ga on 22 May 2005 11:59 PDT
Rated: 4 out of 5 stars

crowlogic-ga
 
  So, form is

y(x) = sum(i..N) a[i]*e^(-b[i]*x)


and task is to  estimate both a and b,  as we are not sure which 
 b[i] set will best fit the data. 


 I assume what you said in the RFC WAS an invitation to post this as an answer.
 
 I promised to add something, but since you already picked a case,
 and I agree with your choice, what is there to add?

I looked at the paper you found. It is clever - may be even too clever.
It also looks like quite a recent result. I doubt that one will find a
ready made program implementing that, free on the net. It looks
complex to program and may be an overkill.

My recommendation would be to use  good old 'peeling method', for
initial guess and follow that with the usual iterations.
At least as first try. You may find out that you need very few terms.
Actually that - how many terms are 'significant' seems to me to be the
very core issue of this interesting project.

In peeling method you plot

log Y vs x   

and see that one end 'fits'  a line. You disregard the other end for now.

That gives you A and B of the dominant term in the sum.

You take that term away and proceed the same way until only 'noise' is left.
 
This method is described e.g. here under name
 DIMSUM, an acronym for DIMension of a SUM of exponentials
http://biocyb.cs.ucla.edu/dimsum/abstract.html

SEARCH TERM was: peel sum of exponentials

I would (for your application in particular) cobine that with Markov Chain,
as done here:
http://www.blackwell-synergy.com/links/doi/10.1111/j.1467-985X.2004.00335.x/abs/?cookieSet=1

He decided just two terms are relevant. I wonder what the clever
approach would do with his data.

Hedgie

Request for Answer Clarification by crowlogic-ga on 22 May 2005 12:41 PDT

Actually, an iterative procedure is not usable for me. I will be
fitting many thousand curves per second and an iterative procedure is
much slower and not gauranteed to converge to a solution.

Clarification of Answer by hedgie-ga on 22 May 2005 22:31 PDT

Crow logic,

     As I am re-reading Kaufman's paper: 


1) You formulate eq (4) as linear combinations of I[k](x) and x^k 
    You need some initial estimate of b[i] to evaluate I[k],  don't you?

2) You determine c[i] and d[i] defined from eq (4) -
      by simple inversion of 2N x 2N matrix

 After the Eq (7) it says: the B[i] are roots of polynomial (8) with
coefficients c[i]

3) You solve for B[i] by ITERATIVE method, using standard library.
       ... and iterate to improve initial b[i] guess?

   His method  looks computationally intensive and not guaranteed to
converge if matrix in step 2) is ill-conditioned.

  I am not convinced that it will be faster than the Peeling Algorithm,
  when fully automatized.

    I would use a supercomputer, try both methods and compare.

Main comment concerning the original questions is this:

 In step 2) you are fitting your data to a base 
             consisting of I[k] and x^k  terms.
 
For the record I want to state :-),
 that I recommended base consisting of exponentials only
 also, to at least try the peeling algorithm.
 However, issue of algorithm and implementation  goes beyond the
original question, so I will stop here.
 
Thank you for an interesting problem and good luck.

Hedgie

crowlogic-ga rated this answer: 4 out of 5 stars

and gave an additional tip of: $5.00

Good suggestions. I'll take them into consideration. Thanks for the help!

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy