Google Answers Logo
View Question
 
Q: Regression (i.e. Least-Squares Fit) without a polynomial model? ( Answered,   0 Comments )
Question  
Subject: Regression (i.e. Least-Squares Fit) without a polynomial model?
Category: Science > Math
Asked by: johnswanstone-ga
List Price: $100.00
Posted: 31 Dec 2004 15:30 PST
Expires: 30 Jan 2005 15:30 PST
Question ID: 449830
I need to fit an arc through a set of <x,y> data points.

I want to find the best fit in terms of minimizing the euclidian
distance between the data points and the points on the arc.  This is
like traditional reqression except that I'm using an arc for the model
instead of a polynomial.

Input:  A set of <x,y> data points
Output: A center point <cx,cy>, radius r, and two angles describing
the start and end of the arc.

You provide:  A mathmatical analysis of the problem and solution and a
least-time algorithm for finding such a solution.  The analysis and
algorithm need to be written up in a report that I can understand and
it also would be nice if the algorithm were implemented in C or C++,
but that is not required.
Answer  
Subject: Re: Regression (i.e. Least-Squares Fit) without a polynomial model?
Answered By: hedgie-ga on 01 Jan 2005 02:28 PST
 
Hi john  

I suppose you mean
          'least squares' algorithm, not   "least-time algorithm"   ??

Also, when you say  'report that I can understand' , it would help if
you would indicate level  of your understanding and knowledge. From your
question, I infer that you know c and understand the linear regession
for polynomial coeficients. I also assume by 'report' you mean reasonable
description you can understand, not a formal report or paper.

Finally, you should undertand that this algorithm involves a nonlinear
regression, which is more diffcult then linear one, in terms of
computer time and existence of local minima.

If all these assumptions are correct, I can point you to a program in
c which will find the coeficients and go through any necessary RFCs
(clarifications)
so that you understand it's operation.

Before we proceed, I would like you look at definition
http://en.wikipedia.org/wiki/Maxima_and_minima
and confirm/clarify the assumptions.

I would like to ask if you know how to write
(and program in c) expression for the Error (=sum of squares) in terms
of the data-points and your arc parameters.

I also wonder if the 

two angles describing inclusion of the
              the start and end of arc
is really necesssary:

The rule is that smaller number of unknowns (3 rather then 5) is much
easier problem to solve.
In most cases, the 'right end points' are obvious once you have the
the three parameters of the 'best' circle.
 In other cases it may be expedient to find the 'best
cirle' and just see how the error changes if you extend/shorten the arc.

However, there can be cases where true minimum of all five variables is needed 
(if you can describe the application a bit, that may help as well:
 Is it one shot issue of given set of points, or something that will
be repeated over and over, with all kinds of datasets; how many
(datapoints,
runs,) and how critical it is to know that you found true (global) minimum?
).
In any case, I would like you give  some thought to the issue
of 3 vs 5 variables  and respond with RFC.

We would then proceed to finding a suitable program .
Hedgie

Request for Answer Clarification by johnswanstone-ga on 01 Jan 2005 07:17 PST
hedgie,

I see that you've marked your response as an answer to this question
but since I don't have the things I'm looking for yet I'm not sure I
agree with this classification.  You indicate that you have, or know
of, a solution to non-linear regression problems written in C and
indicate that my problem falls into such a classification, but you
don't provide any explanation or analysis in support of this
statement.  I think that a better approach would have been for you to
lock the question for the 8 hours and then try to answer it to my
satisfaction before moving to the answer state without providing the
answer I requested.

To answer your questions:

When I say 'least-time algorithm' that is exactly what I mean.  A
least-squares minimization of error would be one way to minimize the
Euclidean distance between all the data points and the arc segment.

When I say a report that I can understand, I mean a written
description of the step-by-step process taken to solve the problem and
an fully detailed explanation of each step, why it was taken and how
it fits into the overall solution.

I can confirm my understanding of the concepts explained at
http://en.wikipedia.org/wiki/Maxima_and_minima As written in my
problem description, I am looking for the best-fit i.e. global optimum
solution.

You ask if I know how to write and program in C an expression for the
Error (=sum of squares) given a candidate solution.  I would reply
that that is part of original problem and that would be your job (well
I can code up any equation that you provide).

The two-angle criteria is required.  How else can you describe an arc
given the equation for a circle?  I guess given an equation for a
circle it would be possible to find the nearest (as in Euclidian
distance) projection on the circle for each data point and then pick
the smallest arc that encloses all those projections.

What I'm doing is finding the boundary of a curved surface in an
image.  I run various image processing algorithms which include noise
reduction and contrast enhancement and then I run an edge detection. 
The input data to this problem is the output pixels of the edge
detection algorithm.

John.

Request for Answer Clarification by johnswanstone-ga on 01 Jan 2005 07:28 PST
I forgot to mention that I'm doing this analysis on several images and
that I would hope that the solution code would take no more than 1-10
seconds or so to run on a modern PC.

Clarification of Answer by hedgie-ga on 02 Jan 2005 00:48 PST
John,
 
        I have done the research but I have not posted the answer as yet.
 
	You are right that I could have post these initial questions into RFC area.
	Particularly since it created this negative initial reaction, I see
now that would be a better choice.
	I can of course withdraw the answer and I will to that if I do not
manage to repair this opening.
	
	 There are few issues to resolve before we make that decision,
	 (namely to post the answer or withdraw the  initial dialog form the answer area)
	 
        Task is LSF of a circular arc to set of data points [x.i y.i] in plane
	        LSF Least Square Fit has to minimize the residuals determined
by perpendicular offsets
		as defined here:
		http://mathworld.wolfram.com/LeastSquaresFitting.html
		
	 This general task in category "Fitting Implicit Polynomial Curves"
has few special cases
	which can be linearized, however the general case involves non-linear regression.
	Nonlinear regression involves iterations: One starts with initial
guess, and in a series
	of steps descends to the bottom of the nearby valley. That valley can
be a local minimum
	of the Objectve Function.
	
	There is no way to guarantee that global minimum  was reached, other 
then exhaustive search.
	
	In your case, fitting a  circular arc, multiple minima are rare:
	 going down from 15% to 2 % of cases as number of data points, N goes
up from 5 to 100
	 -- thats' for randomly generated points; it is more rare for typical real data.
	
	 This problem cannot be avoided: I can give you example which has
four very different solution,
	 4 circles fitting the same set of points equally badly, but all
being the best fit.
	 
	 So, if you insist on having an algorithm which guarantees global
minimum, I will withdraw
	 the answer. It cannot be done.
	   
	 We also would need to clarify this term 'least-time algorithm'. Do
you refer to a CPU
	 time? The only 'least time' I am aware of is in time-optimal control
theory and that
	 does not fit. Can you provide some reference -a web site where this
term is defined?
	 The category, LSF, is called Least SQUARES fit; what has time to do with that?
	 
	 
	 Finally:"The two-angle criteria is required"  requirement can be
analyzed as follows
	 (this is a correction of my first reaction
	  "..there can be cases where true minimum of all five variables is needed ..")
	  
	  There are in fact, no such cases:
	  
	 The quantity to be minimized, Objective Function (Error of Cost) is 
The sum of the squares of the
	  offsets and has no 'length of arc' term. Therefore, once the 'best
circle' was found by
	  3 parameter minimization It (the cost) behaves as follow: if arc is
extended beyond the
	  edge point, the cost will not change. Edge points are well defined
once the center of the circle
	  was found (as having radius (center to point line) with largest and
smallest angle).
	  Therefore these two parameters, arc end angles, should not be
included in the iterations
	  (many minimiaer routines choke when variabes do not affect the Cost).
	   
	 
	 The computational cost is 1000 to 5000 flops per data point. I do
not know if that fits
	 into you 10 s , since you did not mentioned value of N. That cost
will go up if you
	 make several runs for each data set, each with different initial
guess, which is recommended
	 to verify that global minimum was reached.

	  So, if you want the answer with these limitations and problems,
please to an RFC;
	  I will be happy to wthdraw this 'non-answer' from the answer area
if you prefer that.
	  
	  Hedgie

Clarification of Answer by hedgie-ga on 02 Jan 2005 06:17 PST
May the 'least time' in 'least time algorithm' is not a technical term,
 perhaps you just  mean that algorithm should be fast?
 - should use a small amount of CPU time ??

   I can find one which is fast and pay extra attention to selection of
 initial guess (which has big effect on CPU time) however, once again,
 I cannot promise the fastest possible one. This is an active and
evolving field and to find 'fastest way' to do a task - that is a
different - and
 a more dificult task then finding a fast way, or fastest know way.
 Kindly clarify this also.

 I apologize for couple of typos,
such as:
(many minimiaer routines choke when variabes do not affect the Cost)
should be:
(most minimizer routines choke when variables are included which do
not affect the Cost= Objective function)
 
I could retype what is not clear -as this should not be a case of: 
"what do you want? Good spelling or good math?" :-)
but, fact is that (to praphrase Viola) My math is better then spelling ..
http://www.bardweb.net/index.html

Hedgie

Request for Answer Clarification by johnswanstone-ga on 02 Jan 2005 10:08 PST
hedgie,

I don't have any objections to having you answer this question as long
as I feel that I do get a complete and correct answer.  When you
marked the question as answered, my credit card was charged the $100
fee and at that point I felt that I had not received my answer but had
been billed anyway.  You can understand why I would feel a little
upset in that situation.

I've taken a look at your reference page and I understand that there
can be no guaranteed best solution.  You mention that the in my
specific case that the number of local minima are rare.  I'm very
curious about how you know that.  Is this a fact in the sense of a
mathematical proof or more a "rule of thumb" you math guys "just
know".

In my particular case I will have 512 or so <x,y> points and their
error will be less than 10% to 15% of the radius of the true circle.

So I assume you're going to apply the process described at
http://mathworld.wolfram.com/NonlinearLeastSquaresFitting.html to my
particular problem.  I'd like to see the derivation of this process
applied to my problem.  I understand that this process will involve
inverting a matrix for each iteration and that is pretty much a solved
problem mathematically and computationally.  I'd like a little insight
into how you make your decision on which algorithm to choose for my
problem.

As this is the very first time that I've used answers.google.com I
wanted to be sure that I got a computationally feasible answer so I
specified a 'least-time' solution.  What I didn't want was a
mathematically correct answer but an impossible one to compute.
Something like, check ever possible <cx,cy> and r value and pick the
best one.  (I just love the way you math guys can throw out things
like ... expand that series to infinity, or take that limit to zero,
and things like that :-)  To be exactly specific ... I'm looking for a
solution that can converge to a solution given 512 input points in
less than 10 seconds on a modern PC.  It is not necessary to
complicate the analysis or the solution unduly to shave a few
milliseconds off the "straight forward" approach.

John.

Clarification of Answer by hedgie-ga on 02 Jan 2005 20:21 PST
John

I understand your reaction. I started with the answer and then decided
to have  few things clarified first. I should have switched from the
answer area to the RFC area. In any case, you should know that
according to the Terms of Service , "Answers to your questions", point
5,
Google grants full refunds for all reasonable requests for up to 30
days after answers are posted.
http://answers.google.com/answers/faq.html#termsofservice

The answer below is not just application of the general regression
theory. It is based  on a custom search of literature  on LSF of the
circular arcs. Such task is part of the 'computer vision' techniques
and so
the different algorithms and their speed of execution were topic of
active research since the sixties.
 When I say 'in your case' ,"multiple minima are rare", I mean, when
fitting circular arcs, not the specific data sets you have. That is
based on a recent review article, which to compare speed of different
algorithms did numerical experiments on random data. In your case,
with N~500 and points, approximating an arc, I do not expect that to
be a problem. This review article, and few others I found, including
an PhD thesis on the topic, from Brown university,  are listed in
references at the end of this answer.
 
 This is the outline of the Report:
 
  1) Intro - how do we know problem is nonlinear and has a solution(s)?
  2) Transformation of coordinates
  3) Expression for the Objective Function
  4) Selection of initial guess (algebraic method)
  5) C++ program to minimize the Objective Function
             ^^^^ DO NOT MISS THIS POINT 5 ^^^^^
  6) bibliography


                        REPORT
                ===================
  1) When we write the expression for the Objective Function -( a sum
of squares of perpendicular offsets - in our case), we  will see (in
point 3) that it is not a quadratic form. Quadratic from, which we get
e.g. in case of polynomial regression, has one minimum which can be
found by solving a set of linear  equations. In our case, the
equations which determine minimum are nonlinear. They may have no
solution or have several solutions.
     The minimization of the nonlinear function F cannot be
accomplished by a finite algorithm. Various iterative algorithms have
been applied to this end. The most successful and popular are
 (a) the Levenberg-Marquardt method.
 (b) Landau algorithm [17].
 (c) Sp¨ath algorithm [18, 19] 
 
 (Numbers in brackets are references cited in the first paper given
below, at the end of this report, in section 6).
       They all use iterations - a series of smaller and smaller
steps, which move the 'point' from the initial guess to the minimum.
The parameter space, in our case 3-dimensional space <xc, yc, R>
(center of the circle and Radius) can be divided into regions such,
that all iterations which start in one region will converge to the
same solution. The space <xc yc R> is unbounded (as R can be large)
and that may present numerical problems. Therefore, as a first step we
will select different coordinates for our space. The new coordinates
add straight lines to the solution set, and also guarantee that there
is at least one finite solution.
   
2) Transformation  
   We now adopt a parametrization used in [15, 16], in which the
equation of a circle is
 A(x2 + y2) + Bx + Cy + D = 0            (2.4)
 Note that this gives a circle when A = 0 and a line when A = 0, so it
conveniently combines both types of contours in our model.

 We define angle "theta" as follows:
 
 cos (theta)= B/ sqrt( 1 + 4 A * D) 
 
 
  Now one can perform an unconstrained minimization of  in the
three-dimensional parameter space: A D theta
  
  3) Distance P.i of point i = <x y) to the circle A D theta now is:
  
  P.i = A(x^2 + y^2) + sqrt(1 + AD) (x * cos theta + y * sin theta) +D
 
  4) Good initial guess can shorten the calculations. Methods were
developed which allow one to select good initial guess without
iterations.  Paper quoted below describes several: simple AF1,
improved  Gradient-weighted algebraic fit (GWAF), then AF2 and AF3
     For details, see the paper
     http://www.math.uab.edu/cl/cl1/cl1jsmall/node4.html
     
     This is brief outline:
     We introduce z= x*x + y * y for each data point <x.i y.i> and
calculate moments
       
     M.i.j = <r.i * r.j>   where r = x, y, z  and <> is summation over
i,j=1,2, ... N
     
     Solving a linear set of equations with coefficients M.i.j one
obtains a good initial guess for unknown A B C defined in section 2).
     
   5) At the bottom of this page,you find the  The C++ code:

The code requires 2 classes that defines the type data and circle,
they are defines in data.h and circle.h.
The fitting algorithms are defined in CircleFit.h. 
   http://www.math.uab.edu/cl/cl1/details.ps
   
  ^^^^ DO NOT MISS THIS POINT 5 ^^^^^ it can save you lot of coding.
=========================================
   
   6) Bibliography 
   
   the most relevant paper -a review and comparison of algorithms is
   Least squares fitting of circles and lines
     N. Chernov and C. Lesort
  http://www.math.uab.edu/cl/cl1/cl1jsmall/cl1jsmall.html   
     
     This paper has bibliography up To 2002:
     http://www.math.uab.edu/cl/cl1/cl1jsmall/node5.html
     
     which is references in the report above.
     
  
    http://arxiv.org/pdf/cs.CV/0301001

    reviews development up to July 2004
    
    Fig 13 shows flops required per datapoint for different methods a
b and c mentioned in section 1)
        
     See also
     
.
 

 What is an Implicit Polynomial Curve?

Implicit Polynomial (IP) curves and surfaces are mathematical models for the rep
resentation of 2D curves and 3D surfaces, respectively.

 http://www.lems.brown.edu/~tt/2Dfitting/ipcurves.html
 
 
  The 3L Algorithm for Fitting Implicit Polynomial Curves and Surfaces to Data

M. M. Blane, Z. Lei, H. Civi and D. B. Cooper
http://www.lems.brown.edu/vision/publications/journal/3L-PAMI-abst.html
http://www.lems.brown.edu/vision/publications/conference/wacv96_paper.ps.gz



http://www.maths.dundee.ac.uk/~gawatson/circarc.ps

MAPLE worksheets:
--------------------
There is a maple (high level math tool) worksheet: 
Abstract:


This worksheet demonstrates how Maple can be used to fit a circle to a
set of datapoints. If a set of datapoints are known to lie on a
circle, the average quadr
atic distance of the points from a circle should be minimized. This problem can
be written

http://www.adeptscience.co.uk/products/mathsim/maple/apps/subcategory/148/Regression/Statistics-&-Data-Analysis/372/article372.html

http://www.maplesoft.com/applications/index.aspx


http://www.adeptscience.co.uk/maplearticles/f372.html
maple


and an commercial program (with consulting)
--------------------------

http://www.nlreg.com/order.htm


The price of NLREG is only $65 ($70 if outside the USA), which is far
below the cost of comparable commercial regression programs.
http://mathforum.org/epigone/geometry-software-dynamic/crenfeezand

linearised case exists when center of circle is constarined to a line

http://eesof.tm.agilent.com/docs/iccap2002/MDLGBOOK/3CURVE_FITTING/2REGRAPP.pdf
p 8

s

         Additional bibliography:
http://csdl.computer.org/comp/trans/tp/2000/02/i0191abs.htm
 paper itself is available, but not free
The full text of IEEE Transactions on Pattern Analysis and Machine
Intelligence is available to members of the IEEE Computer Society who
have an online subscription and an web account and/or persons
affiliated with
 a subscribing organization as a student, faculty member or employee.
 
 Please do an RFC (request for clarification) as needed. 
 When all is clear (as I hope it is already), rating is appreciated.
 
 Hedgie

Request for Answer Clarification by johnswanstone-ga on 03 Jan 2005 06:14 PST
hedgie,

You said:  "5) At the bottom of this page,you find the  The C++ code:"

I can't find this code.  Can you be more explicit?

John.

Clarification of Answer by hedgie-ga on 03 Jan 2005 16:41 PST
Ops. Wrong URL. Sorry.
This is the correct URL:

 http://www.math.uab.edu/cl/cl1/

At the bottom, there is a text:
----------------------
The C++ code:


The code requieres 2 classes that defines the type data and circle,
they are defines in data.h and circle.h.

The fitting algorithms are defined in CircleFit.h.

------------------------

which points to three files of code. Namely:
http://www.math.uab.edu/cl/cl1/code/data.h
                This file contains  RANDOM DATA GENERATOR - to be
replaced by I/O call to read your data
http://www.math.uab.edu/cl/cl1/code/circle.h
http://www.math.uab.edu/cl/cl1/code/CircleFit.h

 BTW - I did not try to compile the code, nor did not examine what
libraries, if any it may need.

Hedgie
Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy