Google Answers Logo
View Question
 
Q: Efficient amortized time algorithm ( No Answer,   9 Comments )
Question  
Subject: Efficient amortized time algorithm
Category: Computers > Algorithms
Asked by: secret901-ga
List Price: $3.25
Posted: 04 Jul 2002 00:50 PDT
Expires: 07 Jul 2002 19:34 PDT
Question ID: 36429
I have a data file containing all the cities in the United States and
their locations in latitute and longitute.  I'm trying to develop a
Java application that takes in a latitute and longitute and returns
the nearest city.  The easiest way to implement this is to compute the
distance between the input point and every city in the database.  But
this is too inefficient, since its run-time would be O(n) for every
input entered (n is the number of items in the database).  Is there a
way to sort the cities beforehand so that its run-time would be less
than O(n) once the user begins to give input?
Since this project has no practical purpose (just something I'd like
to accomplish during the summer), I do not care how inefficient it is
in sorting the data before the user enters inputs.  The algorithm must
be more efficient that O(n) once the user enters inputs, though.
This solution should be understandable to a computer science student
who has just finished his first year.  If it involves computational
geometry, please make sure that a layman can understand it.  Since I
will be implementing this myself, I do not want referrals to programs
"out there" that already does this.

Clarification of Question by secret901-ga on 04 Jul 2002 00:55 PDT
I meant that the easiest way to implement this is to compute the
distances between the input point and every city in the database and
then return the minimum.

Clarification of Question by secret901-ga on 04 Jul 2002 12:36 PDT
IF your answer involves computational geometry, please explain it in
layman's terms.  Voroni diagrams sounds like what I'm looking for, but
I don't understand it since I have no knowledge in computational
geometry.  Please describe the algorithm, not just vaguely referencing
it.

Clarification of Question by secret901-ga on 04 Jul 2002 13:39 PDT
As hedgie and my question pointed out, this can be done in O(n). 
However, the purpose of my project is not to make a practical
algorithm, but a "theoretically efficient" algorithm.  So, although it
might be overkill, I still do want to go ahead with the project.
Note: for the purpose of this algorithm, one can assume that we're
working with a 2-dimensional plane instead of a surface of a sphere
since the cities are clustered so close together that the curvature of
the earth would not make much of a difference.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Efficient amortized time algorithm
From: jeanluis-ga on 04 Jul 2002 03:08 PDT
 
Hey, you may already know this but, the winner of the 1st annual
Google Programming contest did something very similar to this. Here is
the scoop:
"News for Nerds"
http://developers.slashdot.org/developers/02/05/31/1140201.shtml?tid=156

and here is the code to his project:
http://www.ofb.net/~egnor/google.html

You might want to take a look at what he did, it may give you some
ideas, I haven't looked at it in too much detail, but it says he built
a 2D index of the coordinates...
Hope this helps
--jld
Subject: Re: Efficient amortized time algorithm
From: rhansenne-ga on 04 Jul 2002 03:24 PDT
 
Hi secret901-ga,

ozguru-ga, I'm afraid your solution won't work in all cases, as the
closest city does not necessarily need to have the closest longitude
or the closest latitude.

Here's an algorithm I used for a similar problem. It may not be the
most efficient algorithm out there, but it's definitely faster then
O(N). The exact efficiency depends on the distribution of your points,
but I'm guessing it's somewhere in the neighbourhood of O(log(N)).

The idea is to build squares around the target point, untill we find
other points (cities) within the surrounding square.

Let's say the input coordinates (longitude and latitude) are Xi and
Yi.

- Sort an index for the X coordinates of all your points (cities) -
this only needs to be done once
- In the same way, sort an index for the Y coordinates
- Find the points with an X coordinate within a given range around Xi.
Call this List-Xs.

You will have to determine this range yourself through trial and error
(benchmarking), because it depends heavily on the density and
distribution of your points. You can select these points with a very
simple SQL query.

- Similarly, find the points with an Y coordinate within a given range
around Yi. Call this List-Ys.

- Compare the points on List-XS with those on List-YS and select out
the points which appear on both. This is your list of candidates.
You only need to do a distance comparison on these points.

If no points appear on both lists, double the range and start again
with the new range, untill at least one candidate is found.

Hope this helps,

rhansenne-ga.
Subject: Re: Efficient amortized time algorithm
From: ulu-ga on 04 Jul 2002 05:03 PDT
 
Hope this gets you started in the right direction.  (Looks like other
comments were added since I started this)

The term often used for this problem is the Nearest Neighbor problem.
://www.google.com/search?hl=en&ie=ISO-8859-1&q=%22nearest+neighbor+problem%22
://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=%22nearest+neighbor%22+java

I'm hoping you've had a data structures class in which trees are
introduced.
If not, you might want to pick up the textbook for your future class.

The Algorithm Design Manual by Steven S. Skiena, Steve Skiena
"Nearest Neighbor Search." §8.6.5 
http://www.amazon.com/exec/obidos/ASIN/0387948600/ref=ase_weisstein-20/104-0851109-3518306

This lists a few links to implementations.
http://www.cs.sunysb.edu/~algorith/files/nearest-neighbor.shtml

Once common method is to use a k-d tree.  K-d refers to k (some
arbitrary number) of dimensions.  Binary trees subdivide a 1-d space. 
In your case it is just 2 dimensions (latitude and longitude).
http://www.cs.sunysb.edu/~algorith/files/kd-trees.shtml

The other is to create Voroni diagrams.
http://www.cs.sunysb.edu/~algorith/files/voronoi-diagrams.shtml

A simple method would use a 2-d array, perhaps for every 1 degree
"square" in the US.  Each element would contain every city within that
square.  You can then search an increasing neighborhood of squares
until you know you have the closest city.  (Radix sort)  The size of
the square will greatly effect performance.

Another method is to bit-interleave the coordinates (make sure they
are of the same "length").  You can then sort the entire city list by
this new dimension.  Find the point in the list.  Calculate the
distance to the "closest" city in this list (cities with the next
higher and lower bit-interleaved values).  Use that to compute a range
in the sorted list.  Do a linear search of that tiny range for the
closest city.  Probably O(log(N)).

Algorithms for Nearest Neighbor Search (html version of PowerPoint
slides)
http://216.239.51.100/search?q=cache:tuiLAaNuknkC:dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt+%22k-d+trees%22+nearest+neighbor&hl=en&ie=UTF-8

Since you are just using US cities (and doing it as a fun project),
you can probably do some simplifications.  You might consider it as a
flat plane, ignoring great circle routes.  Since you are looking for
closest city in a dense domain, that is reasonable.  Also, don't
forget cos(latitude)*longitude.

"This report presents an efficient algorithm based on the generalized
Voronoi diagram construction for solving the nearest-neighbor problem
in the plane."
http://pharos.cpsc.ucalgary.ca/Dienst/UI/2.0/Describe/ncstrl.ucalgary_cs/2001-680-03

Other links:
ANN: Library for Approximate Nearest Neighbor Searching (C++)
http://www.cs.umd.edu/~mount/ANN/
Nearest Neighbor Queries in Metric Spaces
http://cm.bell-labs.com/who/clarkson/nnms.html
The UB-Tree: Performance of Multidimensional Range Queries
(bit-interleave of coordinates described here)
http://216.239.39.100/search?q=cache:UqdqBL3BjqcC:mistral.in.tum.de/results/publications/TUM-I9814.pdf+interleave+coordinates&hl=en&ie=UTF-8

Perhaps when completed, you can post a link to your finished app.
Good luck!
Subject: Other related Google Answers
From: ulu-ga on 04 Jul 2002 09:34 PDT
 
You may want to check out these related Google Answers:

Formula for calculating the miles between two sets of latitude and longitude
https://answers.google.com/answers/main?cmd=threadview&id=31660

Complete list of USA towns and states 
https://answers.google.com/answers/main?cmd=threadview&id=17619
Subject: Re: Efficient amortized time algorithm
From: secret901-ga on 04 Jul 2002 09:59 PDT
 
Ozguru:
Had your answer been correct, the algorithm to implement this would be
O(log n) and not constant time as you stated.  It would still take at
least O(log n) to find the 4 cities from the ordered lists.
Subject: Re: Efficient amortized time algorithm
From: ulu-ga on 04 Jul 2002 10:32 PDT
 
Concerning the clarification.

You might look at the two other methods I described.  They are a
little simpler.  Unfortunately, the best way to detail some of the
common methods is through drawing pictures.  If I have time, I post
some more info.
Subject: Re: Efficient amortized time algorithm
From: secret901-ga on 04 Jul 2002 10:48 PDT
 
Thank you ulu for your solution.  It was very helpful...if only I can
pay you, then you'd explain more, since it seems that you have what
I'm looking for.
Subject: Re: Efficient amortized time algorithm
From: hedgie-ga on 04 Jul 2002 13:23 PDT
 
Oct-tree (or rather quad-tree) is indeed an efficient way, but perhaps
an overkill:

If you pick  your point at  (Lo,La), and run through ALL  cities  i at
(Lo_i , La_i )
 and do a check before calculation of  spherical distance (which uses
sin and cos),

like this:
             if { abs (Lo -Lo_i) > lim1 OR abs (La - La-i) > lim2 }
skip to next i

and keep adjusting  lim1 lim2 down as you are getting closer, then you
can zip through
a list if thousands of cities in a jiffy . It is O(n)  in checks, but
those are cheap and you
can do that in integers and/or binary .

But you do nor have to do it, since this is  basic function of every
GIS, which already
may have the database of cities and  efficient proximity algorithm
build in.

 Search on  open GIS proximity , e.g. 
http://www.regis.berkeley.edu/gardels/miscdocs/radkelec.pdf
 for list see e.g:
http://www.mapcruzin.com/free_gis.htm

        sorry if this kills your summer project
       but you can always spentd the summer studying GISes
       there is a lot there and more is needed  .. :-)

hedgie.
Subject: Expert Hanan Samet
From: ulu-ga on 04 Jul 2002 20:36 PDT
 
An expert in the field is Hanan Samet.  He has two classic books on
the subject.
H. Samet, Applications of Spatial Data Structures: Computer Graphics,
Image Processing, and GIS, Addison-Wesley, Reading, MA, 1990.
ISBN0-201-50300-0.
H. Samet, The Design and Analysis of Spatial Data Structures,
Addison-Wesley, Reading, MA, 1990. ISBN 0-201-50255-0.
http://www.cs.umd.edu/~hjs/

A research assistant of his, Frantisek Brabec, has created a set of
java Spatial Index Demos.  These demos will help illuminate other
descriptions you've seen of the algorithms.  Also, these demos look
like they are based on a dynamic database.  The trees can be greatly
optimized if you have static data.
http://www.cs.umd.edu/~brabec/quadtree/index.html

Also, in reference to the bit-interleave method before, that is very
similar to the other tree methods (k-d, quad,...).  You might want to
normalize the points, prior to sorting (subtract offset/(common
range)).

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy