View Question
Q: Identifying zip codes between two points ( Answered ,   8 Comments )
 Question
 Subject: Identifying zip codes between two points Category: Computers > Programming Asked by: maxout-ga List Price: \$75.00 Posted: 22 Aug 2003 17:03 PDT Expires: 21 Sep 2003 17:03 PDT Question ID: 247783
 ```I have a databaes of zip codes along with the corresponding latitude and longitude. I need to find all of the zip codes between the two inputs, including a buffer of N% or N miles of the distance between the two. For example, Dallas to Houston is 250 miles (75207 to 77050). I need to find all of the zip codes betweent he two zips as well as 12 miles (5%) on either side of the line. This is all in SQL Server 2K and t-sql. The steps I expect to see are: - Convert lat/lon to polar - Project polar coords down to cartesian coordinates - An algorithm/SQL function that takes the line between the two points and make a polygon that represents the buffer area - A SQL function that tests whether the point/zip is inside the polygon above - A SQL query that calls the function to give the results``` Request for Question Clarification by mathtalk-ga on 23 Aug 2003 20:48 PDT ```Hi, maxout-ga: The steps you outline are not quite the ones I envision to solve your problem. Let me describe my approach and if it seems satisfactory to you, I will post the details as an answer (including the T-SQL). Given any two points on a sphere, it can be rotated so that those two points are actually on the "equator". In effect we can treat a great circle passing through two points as the equator of the sphere so far as distances between points are concerned. So consider the "special case" of two points on the equator. If I've understood the region of interest to you, it consists of those points within a given latitude above or below the "equator" whose longitudes lie between that of those two given points. Note that this region is not exactly a spherical rectangle. Although our beginning and ending segments are each arcs of a great circle and bisected by one of the given points, the upper and lower segments are arcs of circles of longitude and hence not of great circles. However the area is more nearly a spherical rectangle as its width narrows. If we can agree this is the solution, at least in the special case of two points on the equator, then by providing an explicit formula that rotates all the points on the sphere rigidly so as to position any of two given points on the equator, we can map a prospective point using this formula and see if it lies in the proper relationship to the two derived points on the equator. There are additional issues I plan to discuss in regard to making the query efficient, both in terms of an execution plan and computations. regards, mathtalk-ga``` Clarification of Question by maxout-ga on 24 Aug 2003 09:28 PDT ```So if I understand correctly, you are wanting to move the two points to the equator, then perform the operation on an easer to work with spherical rectangle. Having an exact rectangle is not of great importance. As long as the amount of variance is not significant in a NY -> LA search it'll work for my problem. I do have to offer that the computations need to efficient enough to allow for real time searches. (Real time being defined as < 15-20 seconds on an average server) I don't think this will be a problem, especially given that these are simple operations, however. It has been about 15 years since I took any geometry, so forgive me if my terms are off. I'm glad others still remember/use it :)``` Clarification of Question by maxout-ga on 26 Aug 2003 21:29 PDT ```Please unlock the question if you are not going to answer. It has been locked for almost 36 hours with no answer. Thanks.``` Request for Question Clarification by mathtalk-ga on 27 Aug 2003 00:27 PDT ```Hi, maxout-ga: Please read the Comment which I posted at bottom. If you wish to provide a deadline to work to, or if you simply wish to have another Researcher attempt your question, please advise me. I will be happy to respect your wishes. regards, mathtalk-ga``` Clarification of Question by maxout-ga on 27 Aug 2003 07:34 PDT ```THanks for your detailed response mathtalk. Having waited for a bit I was a bit concerned there was either a bug in the system causing the lock not to release, or a rogue expert keeping me from getting an answer. In the mean time I've read through many of your past answers, and I appreciate it is you working on the problem as I know I will receive a correct and detailed answer. As best as I can follow, the steps outlined below seem to approach a solution. There seems to be a bit of uncertainty at the SQL point, but something you are addressing via one of the two alternatives at the end. To further clarify my problem (and hopefully simplify it): - Stored procedures and sQL functions are both fine and are the preferred way to house this code as they will perform well - The buffer area should be N miles or N%. It does not need to support both unless you feel like it. - On your SQL query below, providing the endpoints is probably redundant. You should just be able to provide the coordiantes of the area and return all the results. Also, a question. Why is it necessary to perform the orthogonal decomposition after we convert to cartesian coordinates if we are dealing in a plane? Is it because we are converting and need the handedness? (Please forgive my lack of knowledge in this area of math)``` Request for Question Clarification by mathtalk-ga on 27 Aug 2003 09:21 PDT ```Hi, maxout-ga: Thanks for clearing up your comfort with a stored procedure as an approach. The orthogonal decomposition "milks" the Cartesian coordinates for information needed to decide whether a point C falls inside the region or not: u = v + w How big ||w|| is determines how far C (corresponding to unit vector u) lies above or below the "equator" (great circle) passing through A,B. The direction of v, i.e. whether it lies between v_A and v_B or not, tells us whether C is closer to the "short" segment arc'ing from A to B, or to the "long" portion of that great circle. We only want those points such that v is "in between". It is not necessary to explicitly compute both v,w from u, as you will see in my final answer. It is possible to compute: = and to rely on ||v||^2 = 1 - ||w||^2, so that our computation with individual vectors u can be limited to a couple of inner products (once the vectors v_A and v_B have been suitably "preprocessed"). regards, mathtalk-ga``` Clarification of Question by maxout-ga on 27 Aug 2003 11:35 PDT ```Thanks, mathtalk. I look forward to seeing your final response. Out of curiousity, what is your background? Your answers have ranged from math related problems to theological analysis, and you appear to grasp the minutia of each topic well.``` Request for Question Clarification by mathtalk-ga on 28 Aug 2003 15:34 PDT ```Hi, maxout-ga: I've written and tested a SQL script that creates a table T with test case info, see my second comment. I threw in a user-defined function which computes the angular separation between points on a unit sphere from latitude and longitude coordinates. When this function is scaled up to the Earth's radius, i.e. slightly less than 4000 miles, it gives the "great circle" distance between points. Note that this function expects the coordinates to be given in degrees (heresy to mathematicians) because I suspect this is how your actual table will be laid out. I chose the test cases to illustrate some of the circumstances which might arise, but did not want to spend a great deal of time on it since presumably you have a complete dataset to play with. The paths I have in mind to illustrate are: Dallas to Houston (per your original example) Dallas to San Antonio (changing one endpoint) Los Angeles to New York (a longer distance) I ran these commands on my local SQL Server 2K, using Windows authentication. It was mildly surprising to me that the name of the function needs to be qualified by an owner name (dbo.SphereDist in this case), but I guess a "user-defined function" really does imply association with a specific user. Please try to execute the script in a "scratch" database and let me know if there are any questions or concerns. I expect to post an answer tonight. regards, mathtalk-ga``` Request for Question Clarification by mathtalk-ga on 28 Aug 2003 21:37 PDT ```Hi, maxout-ga: I finished the initial coding of a stored procedure, ZipStrip, which takes two zipcodes (for endpoints A,B) and a "half-width" distance N as arguments. Some debugging of the rather complex arithmetic inside is unfortunately needed, and given the lateness of the hour, I no longer expect to be able to post an answer tonight. regards, mathtalk-ga``` Clarification of Question by maxout-ga on 28 Aug 2003 23:06 PDT ```Not a problem. If you are able to finish by tomorrow EOB I should be able to test it, otherwise it may be Monday before I have a chance. Thanks..```
 Answer
 Subject: Re: Identifying zip codes between two points Answered By: mathtalk-ga on 02 Sep 2003 07:49 PDT Rated:
 ```Hi, maxout-ga: REMARKS ======= Having kicked around the math portion of the question as much as we already have, I'm now thinking that I'll lead off the answer with the SQL stored procedure that selects the appropriate zipcodes, and then back into the mathematical derivation by way of explaining the code. First some words about optimization. This code is written to optimize for the case when a very large number of zipcode locations are to be checked for the geographic conditions (what we've called the "latitude" and "longitude" tests). Thus the stored procedure is designed to minimize the computation that is done by querying each individual record (little more than two inner products and inequality comparisons) at the expense of some "up front" computations that are shared across all locations. Improvements in speed beyond this will depend on "winnowing" the set of records that are to be checked. For example, although my definition of the table T has a primary key on zip[code]: CREATE TABLE T ( zip char(5) not Null PRIMARY KEY, phi float not Null, theta float not Null ) which by default would produce a clustered index on this field, we might prefer to build a clustered index on phi (latitude) or theta (longitude). A clustered index uses the physical arrangement of the records to "sort" its key, so that in creating the execution plan if a query includes upper and lower bounds on (say) the latitude T.phi, only those records whose latitudes are within the given bounds would need to be checked. So the next level of performance optimization I would look at would be along these lines; I did not explore it here since I'm working only with my own table definition, and decisions about the restructuring of indexes on that table need to be made in perspective of all uses (including population) placed on it by the application. Also a few words about the mathematical background assumed in our derivation. We will of course discuss converting latitude and longitude into Cartesian coordinates, albeit in a slightly customized fashion for the purposes of our problem. We assume some familiarity with the notion of inner product or "dot product" between vectors in three dimensions, which we abbreviate: u.v = u_x*v_x + u_y*v_y + u_z*v_z where the subscripts on the right-hand side denote x,y,z components of standard Cartesian coordinates. We will also use the cross-product and explain its definition in three dimensions. STORED PROCEDURE ================ Here then is the definition of the stored procedure ZipStrip, including some commented-out "instrumentation" used to verify its validity: if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ZipStrip]') and OBJECTPROPERTY(id, N'IsProcedure') = 1) drop procedure [dbo].[ZipStrip] GO -- create a stored procedure to find all locations by zipcode -- within N miles of the shortest path between locations A and B CREATE PROCEDURE ZipStrip( @zipA char(5), @zipB char(5), @N_mi float ) AS BEGIN -- convert coordinate degrees to radians -- TODO: add logic in case locations for A,B don't exist DECLARE @phiA float, @thetaA float, @phiB float, @thetaB float SELECT @phiA = T.phi, @thetaA = T.theta FROM T WHERE T.zip = @zipA SELECT @phiB = T.phi, @thetaB = T.theta FROM T WHERE T.zip = @zipB DECLARE @deg2rad float, @phiDiff float, @thetaDiff float SET @deg2rad = PI( )/180 SET @phiDiff = @deg2rad * (@phiA - @phiB) SET @thetaDiff = @deg2rad * (@thetaA - @thetaB) SET @phiA = @phiA * @deg2rad SET @thetaA = @thetaA * @deg2rad SET @phiB = @phiB * @deg2rad SET @thetaB = @thetaB * @deg2rad -- convert N miles to angle for "latitude" test -- NB: the only place R = 3958 is used is here DECLARE @CosNR float SET @CosNR = COS(@N_mi/3958) -- convert latitude and longitude to Cartesian 3D coordinates -- (phi,theta) --> [cos(phi)cos(theta),cos(phi)sin(theta),sin(phi)] -- Works with Northern latitudes and Western longitudes regarded -- as positive angles, placing +x-axis on the Prime Meridian and -- +z-axis running out the North Pole. While this produces a -- "left-handed" coordinate system, that's not an issue if one is -- interested only in distances, as here. DECLARE @vAx float, @vAy float, @vAz float SET @vAz = SIN(@phiA) SET @vAx = COS(@phiA) SET @vAy = @vAx * SIN(@thetaA) -- COS(@phiA)*SIN(@thetaA) SET @vAx = @vAx * COS(@thetaA) -- COS(@phiA)*COS(@thetaA) DECLARE @vBx float, @vBy float, @vBz float SET @vBz = SIN(@phiB) SET @vBx = COS(@phiB) SET @vBy = @vBx * SIN(@thetaB) -- COS(@phiB)*SIN(@thetaB) SET @vBx = @vBx * COS(@thetaB) -- COS(@phiB)*COS(@thetaB) -- find the sum V = v_A + v_B and scale it by 1/(1 + v_A . v_B) DECLARE @Vx float, @Vy float, @Vz float, @scaleV float SET @scaleV = 1.0/(1.0 + ( COS(@phiDiff) - COS(@phiA)*COS(@phiB)*SIN(@thetaDiff)*TAN(0.5*@thetaDiff) )) SET @Vx = (@vAx + @vBx) * @scaleV SET @Vy = (@vAy + @vBy) * @scaleV SET @Vz = (@vAz + @vBz) * @scaleV -- find the unit vector W perpendicular to both v_A and v_B -- take cross product of those two vectors and normalize it DECLARE @Wx float, @Wy float, @Wz float, @scaleW float SET @Wz = -COS(@phiA)*COS(@phiB)*SIN(@thetaDiff) SET @Wy = @vAz*@vBx - @vBz*@vAx SET @Wx = @vBz*@vAy - @vAz*@vBy SET @scaleW = 1.0/SQRT( @Wx*@Wx + @Wy*@Wy + @Wz*@Wz ) SET @Wx = @Wx * @scaleW SET @Wy = @Wy * @scaleW SET @Wz = @Wz * @scaleW -- We've now precomputed all the quantities depending solely -- on v_A and v_B, so we're ready to perform the main SELECT -- check V.W, should be zero -- SELECT @Vx*@Wx + @Vy*@Wy + @Vz*@Wz as VdotW -- check V.vA and V.vB, should be equal -- SELECT @Vx*@vAx + @Vy*@vAy + @Vz*@vAz as VdotvA, -- @Vx*@vBx + @Vy*@vBy + @Vz*@vBz as VdotvB -- check norm of V squared -- SELECT @Vx*@Vx + @Vy*@Vy + @Vz*@Vz as NormVSqr SELECT T.zip FROM T WHERE SQRT( 1.0 - SQUARE( SIN(T.phi*@deg2rad)*@Wz + COS(T.phi*@deg2rad)*SIN(T.theta*@deg2rad)*@Wy + COS(T.phi*@deg2rad)*COS(T.theta*@deg2rad)*@Wx ) ) BETWEEN @CosNR AND 1.0E-8 + ( SIN(T.phi*@deg2rad)*@Vz + COS(T.phi*@deg2rad)*SIN(T.theta*@deg2rad)*@Vy + COS(T.phi*@deg2rad)*COS(T.theta*@deg2rad)*@Vx ) END TEST CASES ========== Recalling the test case records which were given in a Comment below, here are a few tests of the stored procedure based on them: -- this first series tests the Dallas/Houston strip ZipStrip '75207','77050',5 go -- 75110 'Corsicana TX' -- 75207 'Dallas' -- 77050 'Houston' ZipStrip '75207','77050',50 go -- increased width, no additional records ZipStrip '75207','77050',100 go -- 76710 'Waco TX' appears now ZipStrip '75207','77050',500 go -- 78210 'San Antonio TX' at this width -- NB: 76110 'Fort Worth TX' never appears, despite -- its proximity to Dallas, because of its angle to -- the path from Dallas to Houston -- this second series tests the LA/NY strip ZipStrip '90010','10010',50 go -- 10010 'New York' -- 63401 'Hannibal MO' -- 90010 'Los Angeles' ZipStrip '90010','10010',500 go -- two more records appear at this distance -- 75207 'Dallas' -- 76110 'Fort Worth' HOW IT WORKS ============ The stored procedure ZipStrip is intended to take as input the zipcodes for two distinct locations A,B within an existing table T of such locations. A constant @deg2rad which gives the number pi/180 of radians per degree is defined for our use in converting degrees of latitude phi and longitude theta given in that table into the radian values expected by the trigonometric functions in MS SQL Server's Transact-SQL implementation. The mapping from latitude/longitude pairs we use here: [x,y,z] = [ COS(phi)*COS(theta), COS(phi)*SIN(theta), SIN(phi) ] deserves a bit of explanation. First of all the mapping is to Cartesian coordinates on a unit sphere. This effectively assumes that distances on the Earth can be modelled as if it were a perfect sphere. Actually the Earth is more of a oblate spheroid or ellipsoid, with a slightly greater radius of 3963 miles at the equator versus 3950 miles at the poles, but I've chosen to use an intermediate value of R = 3958 miles for the purpose of this computation. The upshot of the scaling is that the radius is only needed in one expression, so for the rest of the procedure we are fine with working only on a unit sphere. The one place where R is needed is in the condition for "half-width" of the strip of zipcode locations, which elsewhere we refer to as the "latitude test" because the interpretation of the path from A to B as an "equator" would make an distance from this path correspond to an angle of latitude. Precisely, a spherical distance of N miles on a sphere of radius R corresponds to an angle (as viewed from the sphere's center) of N/R radians. Therefore we will want to select points whose "unit vector" u lies within this angle of "latitude" with respect to an "equator" passing through input points A,B. The other test that we impose is the "longitude test," which asks that this same unit vector u should fall between the "lines of longitude" that pass through A,B respectively. Here we are not speaking of real lines of longitude, but rather of great circles at A,B resp. which are perpendicular there to the great circle "equator" through A and B. Computationally we need to decompose the vector u into "orthogonal" components: u = v + w such that v lies in the same plane as the great circle "equator" through A and B, and w is perpendicular to this plane. Our approach is to take the two unit vectors v_A and v_B that correspond to points A,B and construct a unit vector: v_A x v_B Here is meant a special "cross product" of three dimensional vectors which has the characteristic of being perpendicular to both "factors" v_A and v_B. We then "normalize" the resulting vector to have unit length and call this W. The relationship of u,w, and W is then given by a simple inner or "dot" product: w = (u.W) W This is the scalar multiple (u.W) of W, and it has length or "norm": ||w|| = |u.W| because ||W|| = 1. The cross-product of two vectors is only defined in three dimensions, and it can be a bit cumbersome to state. The most compact form of definition is to describe it in terms of a determinant, but here it is written out: u x v = [ u_y*v_z - u_z*v_y, u_z*v_x - u_x*v_z, u_x*v_y - u_y*v_x ] All we really need to know for our problem can be verified by direct computation, namely that: (u x v).u = 0 (u x v).v = 0 i.e. that the cross-product is orthogonal (perpendicular) to both its factors. In our procedure we apply this to get W from v_A x v_B to be perpendicular to both of them, and hence to the entire plane of the "equator" passing through A and B. Once we have w from W by the inner product relation above, the other component v is also known: v = u - w In fact we don't need to explicitly compute v at all, because the only things we really will need about v are: 1) its norm ||v||, and 2) its inner product with the "bisector" of v_A and v_B v.(v_A+v_B) The first of these values is easily obtained by the Pythagorean relationship: ||v||^2 + ||w||^2 = ||u||^2 = 1 because v,w are the perpendicular "legs" that form sum vector u (which is of unit length), and thus: ||v|| = SQRT( 1 - ||w||^2 ) The second of those values above, the inner product of v with the sum of vectors of v_A and v_B, is important because of its relationship to the "longitude test". Imagine for a moment that v is restricted to being a unit vector lying in the plane of the "equator" passing through A and B, i.e. the plane that contains both v_A and v_B. This expression: v.(v_A + v_B) will then take on positive and negative values as v "rotates" around that great circle. In fact the maximum value occurs when v is parallel to v_A + v_B, namely at the midpoint of the (shortest) path between A and B, and the expression has a strictly smaller value at the "endpoints" of this path: when v = v_A or v = v_B, v.(v_A + v_B) = 1 + (v_A . v_B) Therefore if v were a unit vector (in the plane of v_A and v_B), our test for whether v falls "between" v_A and v_B would simply be: v.(v_A + v_B) >= 1 + (v_A . v_B) where the right-hand side of this inequality is simply a constant, depending only on v_A and v_B, that we can precompute. More generally, since v need not be a unit vector, we need to scale the test according to the actual length of v: v.(v_A + v_B) >= ( 1 + (v_A . v_B) ) ||v|| A couple of tricks are used to make the test more "compact". We define: V = ( 1/ (1 + (v_A . v_B) ) (v_A + v_B) so that we have only to check: v.V >= ||v|| and second we note that by the decomposition u = v + w, w is orthogonal to V and: u.V = v.V Therefore, we have the implemented form of the longitude test: u.V >= ||v|| But this ties in easily with the other half, our "latitude test". Instead of asking: ||w|| <= SIN(N/R) which expresses the notion that the perpendicular distance ||w|| that u lies above the plane of v_A and v_B is within the allowed N miles, we can turn the relationship around to a condition on ||v||: ||v|| = SQRT( 1 - ||w||^2 ) >= COS(N/R) [note reversal in inequality direction] Now the advantage of stating the combined conditions like this: COS(N/R) <= ||v|| <= u.V is that we can use the BETWEEN syntax of a WHERE condition to insure that the expressions are only mentioned and thus only evaluated once: WHERE SQRT( 1 - ||w||^2 ) BETWEEN COS(N/R) AND u.V I need to look back over my notes to pick out some links for additional resources, but I'm posting this now for you to test. regards, mathtalk-ga```
 maxout-ga rated this answer: ```I'd sure like to crawl in Mathtalk's brain for a day. My answer turned out to be fairly involved, but he broke it down step by step not only providing the answer, but also the reasoning behind the answer. Upon testing his answer operates exactly as expected, and allows my application to run real time instead of batch. Thanks!```

 Comments
 Subject: Re: Identifying zip codes between two points From: mathtalk-ga on 27 Aug 2003 00:25 PDT
 ```Hi, maxout-ga: I've been actively working on your question during this time, and while I can appreciate that you might think this is an overly long time, please bear in mind that it involves both a mathematical problem and SQL Server issues. To post a partial answer for either part as an answer would be against the Google Answer's Terms of Service. Thus I will use this comment to give an outline of the answer. MATHEMATICAL PROBLEM ==================== Let two distinct points A,B taken from your zipcode table be specified. These provide the coordinates of latitude and longitude on the Earth's surface, which we will model as a perfect sphere of radius R miles. As can be seen from the special case in which A,B are endpoints on the equator, the set of points within N miles of the (great circle arc) shortest path from A to B has two pairs of boundaries. The first pair we will describe by a "latitude test" since in the special case the upper and lower boundaries are simply two lines of latitude, parallel to the equator. The second pair we will describe by a "longitude test since in the special case the left and right boundaries are simly two lines of longitude (passing through A,B respectively). In the general case we will require these formulas for the mathematical expressions. Here v_A and v_B represent unit vectors in three dimensions whose Cartesian coordinates are converted from the latitude/longitude measurements of points A,B respectively. We will use ||v|| to denote the norm (length) of a vector v, and to denote the inner product (dot product) of two vectors. Coordinate Conversion: Given the latitude phi and longitude theta of a point on the unit sphere, its Cartesian coordinates can be mapped by these formulas: x = cos(phi) cos(theta) y = cos(phi) sin(theta) z = sin(phi) [Note: This mapping produces a left-handed coordinate system in 3D, but it is especially convenient for locations in the United States, which all lie in the Northern latitudes and (with minor exceptions) have Western longitudes.] Orthogonal Decomposition: Let u be a unit vector. Assuming that v_A and v_B are not anti-podal (exactly on opposite sides of the sphere), then one can uniquely express: u = v + w where v is a vector in the plane spanned by {v_A,v_B}, and w is a vector that is orthogonal (perpendicular) to that plane. In general v,w will each have length less than 1, as the unit length of u is the hypotenuse of a right triangle formed with "legs" of length equal to that of v,w. ||v||^2 + ||w||^2 = 1 Think of v as the "shadow" of u cast onto the plane that contains both v_A and v_B, if the Earth were a crystal sphere and the Sun were moved directly overhead of this plane. Hmm. Okay, I'd better just stick to the math... Latitude Test: ||w|| <= sin( N/R ) Longitude Test: >= (1 + ) ||v|| Note that, as is to be expected, the first "latitude" test involves the parameter of N miles (suitably scaled by radius R), but the second "longitude" test does not. Summary: Any points u which pass these two tests correspond to locations within the desired region "near" the shortest path from A to B. SQL SERVER ISSUES ================= For the sake of discussion we assume the existence of a simplified table T in which only a primary five digit zipcode key (T.zip) and the two dependent fields of latitude and longitude are present (T.phi and T.theta, respectively). Your original question asks for an implementation based on a user-defined function and a single SQL query. Ignoring for the moment the distinction between specifying N miles versus a percentage N% (of the distance from A to B) as the region half-width, such a formulation might look like this: SELECT C.* FROM T as A, T as B, T as C WHERE A.zip = @zipA AND B.zip = @zipB AND MyTestFunction(A.phi,A.theta,B.phi,B.theta,C.phi,C.theta,@N) = 1 in which we've made use of "parameters" @zipA and @zipB to identify the endpoints by their zipcodes and @N to specify the threshold distance in miles. Here MyTestFunction( ) should be written to compute v_A from A.phi,A.theta and v_B from B.phi,B.theta, to compute u from C.phi,C.theta and determine its orthogonal decomposition, and to return 1 if both tests described above are satisfied (otherwise 0). While such an implementation is possible, its structure requires the evaluation of MyTestFunction( ) on all records in table T (as aliased here by C). I fear that the performance of this approach would be much slower than optimal. Two alternatives to this can be considered. One is to use a stored procedure, and another is to use a user-defined function that returns a table value (which could be an image of the selected rows from T). Either of these approaches would allow us to compute once for all rows those values, like the inner product , which depend only on the endpoints A,B and not on the prospective test point. I therefore request your patience as I work to supply a clear and complete explanation, both for the mathematical methods and SQL techniques which I will recommend. regards, mathtalk-ga```
 Subject: Re: Identifying zip codes between two points From: mathtalk-ga on 28 Aug 2003 15:21 PDT
 ```-- create a sample table T to hold latitude and longitude -- coordinates keyed to zipcode if exists (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[T]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1) DROP TABLE [T] GO CREATE TABLE T ( zip char(5) not Null PRIMARY KEY, phi float not Null, theta float not Null ) GO INSERT INTO T VALUES ('75207', 32.7853, 96.8223) -- Dallas go INSERT INTO T VALUES ('77050', 29.9023, 95.2677) -- Houston go INSERT INTO T VALUES ('75110', 32.0983, 96.5183) -- Corsicana TX go INSERT INTO T VALUES ('76710', 31.5356, 97.1921) -- Waco TX go INSERT INTO T VALUES ('78210', 29.3962, 98.4659) -- San Antonio TX go INSERT INTO T VALUES ('76110', 32.7066, 97.3382) -- Fort Worth TX go INSERT INTO T VALUES ('90010', 34.0589, 118.3299) -- Los Angeles CA go INSERT INTO T VALUES ('10010', 40.7386, 73.9826) -- New York NY go INSERT INTO T VALUES ('63401', 39.7145, 91.4209) -- Hannibal MO go -- create a user-defined function which finds distance/angle between -- two points on the unit sphere from "geographic" coordinates -- latitude phi and longitude theta measured in degrees CREATE FUNCTION SphereDist ( @phiA float, @thetaA float, @phiB float, @thetaB float ) RETURNS float AS BEGIN -- convert all degrees to radians DECLARE @deg2rad float, @phiDiff float SET @deg2rad = PI( )/180 SET @phiDiff = @deg2rad * (@phiA - @phiB) SET @phiA = @deg2rad * @phiA SET @phiB = @deg2rad * @phiB SET @deg2rad = @deg2rad * (@thetaA - @thetaB) -- this form of the dot product between vectors A & B -- minimizes cancellation errors due to small differences -- between pairs of latitude or longitude coordinates RETURN( ACOS( COS(@phiDiff) - COS(@phiA)*COS(@phiB)*SIN(@deg2rad)*TAN(0.5*@deg2rad) ) ) END -- test the user-defined function by finding all distance pairs -- using 3958 miles as the approximate radius of the earth SELECT A.zip, B.zip, 3958*dbo.SphereDist(A.phi,A.theta,B.phi,B.theta) FROM T as A, T as B WHERE A.zip < B.zip go```
 Subject: Re: Identifying zip codes between two points From: sanjay_gandhi-ga on 30 Mar 2004 13:20 PST
 ```This is amazing. We spent 12 hours just to find the formula for calculating distance between 2 lat/long pairs!!! We have a separate problem though. We got a database of around 150 different stores and 1.6 million customers. I came up with a SQL procedure to create a table with these fields: Store, Customer, Distance, Rank. The rank is based on the closest store to a customer. So sample output will be somthing like this: Customer Store Distance Rank 1 100 2.1 1 1 101 2.9 2 1 120 6.0 3 It is currently taking 60 hours of processing time on a dual proc server. Appreciate if anyone could give any tips to speed it up. I will post it as a question to the math genius maxout-ga if no one responds!```
 Subject: Re: Identifying zip codes between two points From: mathtalk-ga on 02 Apr 2004 06:17 PST
 ```Hi, sanjay_gandhi-ga: Your description is too brief to make detailed suggestions for improvements possible. It seems unlikely that your goal is to rank all 150 stores for each of the 1.6 million customers; surely some savings in computation can be obtained by limiting the results to finding the 5 closest stores for each customer, although of course this is still a whopping big job if you need to do it for all customers at once. I believe that it would be practical to take an "on demand" approach as I did for maxout-ga, given a particular customer, to find the 5 closest stores. You may wish, for practical reasons, to limit the store search to a distance of (say) 500 miles. One element of this would be to quickly apply longitude and latitude limits so that the distance calculation is carried out, not over all 150 stores, but only over a modest subset of them that are "reasonably" close. If you are doing this as a batch, then I'd suggest that you consider an incremental strategy in which you update the results only as entries for stores or customers are added, changed, or deleted. The more details you can supply, the more likely it is that someone with an interest in assisting you could be successful. If you are interested in programming this yourself, I think you'll find much of the computation has been done here for maxout-ga. My efforts to optimize the calculations for his particular need may have obscured some of the basics, however. regards, mathtalk-ga```
 Subject: Re: Identifying zip codes between two points From: sanjay_gandhi-ga on 02 Apr 2004 10:08 PST
 ```--Your description is too brief to make detailed suggestions for --improvements possible. It seems unlikely that your goal is to rank --all 150 stores for each of the 1.6 million customers; surely some --savings in computation can be obtained by limiting the results to --finding the 5 closest stores for each customer, although of course --this is still a whopping big job if you need to do it for all --customers at once. Let me clarify my goal: Get all stores within 30 miles of a customer and rank them based on distance. If a customer has less than 5 stores within that range, then go further till 300 miles to get at least 5 closest stores. I cannot use an incremental approach because when a new store is added, it can change the rankings of any of the 1.6 million customers. So I still have to scan the entire customer database. BTW, I got the processing down to ***10 hours*** on a dual 1.2Ghz machine. Just in case anyone is curious how I did it: NOTE: Avoid using SQL cursors. That's why it took 40 hrs before! 1. First, fill up the destination table (let's say Output_Result) without the rank information. So, just get all stores within 30 miles of the 1.6 million customers. You can speed this up by using what I call "perfect square" method. Instead of using a formula to calculate distance between a customer and all 150 stores, we can limit the number of stores used in the query. Remember, we need all stores within 30 mile radius. So, imagine this 30 mile radius within a square of width=(Diameter of circle). Doing some geometric calulcations, get the upper and lower limit of lat/long first. This will limit your calculation to around 30 stores. And then use the distance formula to calulate distance beteen 2 lat/long pairs and batch insert the result set. This takes about ***6 hours processing time***. 2. Then, do the ranking using SQL Update statement. This was the toughest part. I got this off from some newsgroup. Here is the SQL statement: update Output_Result set Rank = (SELECT Count_Big(*) + 1 FROM Output_Result OR2 Where Output_Result.CustID = OR2.CustID and Output_Result.DistanceToStore > OR2.SADA_DistanceToStore ) This part takes around ***4 hours*** when the Output_Result table has around 20 million rows. Thats it!!!```
 Subject: Re: Identifying zip codes between two points From: sanjay_gandhi-ga on 02 Apr 2004 10:24 PST
 ```CORRECTION: The update sql statement should have been: update Output_Result set Rank = (SELECT Count_Big(*) + 1 FROM Output_Result OR2 Where Output_Result.CustID = OR2.CustID and Output_Result.DistanceToStore > OR2.DistanceToStore )```
 Subject: Re: Identifying zip codes between two points From: mathtalk-ga on 02 Apr 2004 10:45 PST
 ```You don't mention what indexes, if any, are defined on your Output_Results table, but that might be one area to investigate. I believe that a table like Output_Results can be most efficiently maintained incrementally. If a new store is added, you don't have to revise results for all 1.6 million customers, only those within 300 miles of the store (actually only for a subset of these, customers within 30 miles of the store and those with at most 5 store rankings). However running a batch refresh of all the rankings may not be a burden for you if this only needs to be done infrequently, and it's certainly a good idea to have this capability. Cursors can be a good thing, but one needs to be careful to use them efficiently. A cursor over the entire set of customers, for example, is not a good place to begin (because the number of stores is much smaller than the number of customers). If I were starting from scratch I'd try a cursor over the list of Stores and use your "perfect square" method to find all the candidate Customers that might be within 30 miles of each Store. Do the rankings with that smaller set of "output results", then go back and identify Customers with fewer than 5 rankings. Then work with this smaller set of Customers, which requires throwing "perfect squares" out to 300 miles for any Stores. You undoubtedly already have an idea what percentage of your Customers will wind up in the latter category (fewer than 5 stores within 30 miles), so you can judge for yourself if this divide-and-conquer scheme is promising. regards, mathtalk-ga```
 Subject: Re: Identifying zip codes between two points From: sanjay_gandhi-ga on 02 Apr 2004 14:30 PST
 ```WOW! You are quick in responding. I got 2 Foreign Key indices on CustID and StoreID. I am still not confident about updating table incrementally, since we run this update 2 times a year and its not just the store addresses that change, its the customers too. So I'll have to build the logic of tracking which addresses changed and all that. Not worth spending time on. I did try the cursor on store instead of doing a "batch" (Select...Insert Into..) statement. It didnt give any performance improvement - in fact it took an hour more to execute. I think the SQL 2000 is smart enough to optimize the batch insert. It was a good try though. Thanks mathtalk-ga. It really helps to get a second opinion```
 Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service. If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for

 Google Home - Answers FAQ - Terms of Service - Privacy Policy