Google Answers: Comparing two landscapes

View Question

Q: Comparing two landscapes ( No Answer, 10 Comments )

Question

Subject: Comparing two landscapes
Category: Science > Math
Asked by: prettypolynomial-ga
List Price: $25.00

Posted: 14 Nov 2004 17:03 PST
Expires: 14 Dec 2004 17:03 PST
Question ID: 428963

Consider a set of n points that all lie in a plane. Call them individually p_1, p_2, ?, p_n, and for the sake of simplicity, let's assume they all fall in the unit square. Now suppose that we assign to each point a height, h(p_i), so that the points and their heights together define a type of landscape. One way to imagine this landscape is to say that it is defined by a rubber sheet stretched over sticks perpendicular to the plane, where each stick corresponds to one of the points at a particular height. We stipulate that the height of each point p_i is between zero and one, i.e., 0<=h(p_i) <=1 Now, consider two different landscapes L_1 and L_2 over the same set of n points, i.e., each landscape assigns (most likely) different heights to the points p_1, ? p_n. How should I define a metric that measures the difference between two different landscapes, based on their heights and the positions of the given points? Informally, I think that I'm looking for a measure of how well the two landscapes would "fit" if you tried to push them together. I realize there are many different possible measures, from the utterly trivial to an actual physical simulation, but I'm looking for something in between that simply makes sense. So in summary, I want a function d(L_1, L_2, p_1, ?, p_n) that provides this measure. Also, what, if anything, is changed if we further stipulate that for any landscape, the sum of all the point heights = 1, namely: For each landscape: sum[i=1 to n] (h(p_i)) = 1 (In this case, one could assume the landscapes correspond to probability distributions defined over the points.)
Request for Question Clarification by mathtalk-ga on 16 Nov 2004 18:20 PST Hi, prettypolynomial: I worked out a fairly simple formula for integrating the square of differences between two "landscapes" over each triangle (of the underlying Delaunay triangulation). I can present that as a solution, together with some remarks about the restricted case of the landscape points summing to 1, if you think it might be satisfactory. regards, mathtalk-ga
Clarification of Question by prettypolynomial-ga on 22 Nov 2004 13:47 PST Mathtalk, apologies for the delay in responding. May I ask you a question first about the solution you've formulated? I'd like to know if it addresses an issue I'm concerned about. To make this easier, let's consider for the moment a 1-dimensional version of this problem along the unit interval. Say we have a set of points along a line. We define a landscape just as above, but now a landscape is a 2-d rather than 3-d object. I'm concerned about cases like the following. Suppose we examine a small subset of this interval and have: Landscape 1: \| \| \| __\|___\|___\|___\|___ x x+epsilon Landscape 2: \| \| \| __\|___\|___\|___\|___ x x+epsilon If we perform a type of averaging by considering Delaunay triangulations rather than the points themselves, the fact that the two peaks are near each other in the interval -- but may have an arbitrary number (here, I've used two) of intervening points in our set -- seems to get lost. Does your solution address this? Can you give me a brief idea of how?
Clarification of Question by prettypolynomial-ga on 22 Nov 2004 13:50 PST It just occurred to me that I should have used at least 3 intervening points to make my concern clearer, and with that addendum, I look forward to your response.
Request for Question Clarification by mathtalk-ga on 22 Nov 2004 19:35 PST Hi, prettypolynomial-ga: I think the device of using a one-dimensional problem to discuss your concerns is a good one, but I'm not yet clear on what consideration you wish to give for "spikes" that are at nearby but distinct points of the "landscape". To back up a minute, the nature of your application is of course the most important source of criteria for what makes a good measure of distance between "landscapes". I've assumed that referring to a "landscape" is a bit metaphorical, it has some significance for how best to measure any discrepancy. In particular you've noted that the underlying points p_i for both landscapes are precisely the same, and that these form a Delauney triangulation of a region. These aspects indicate to me that, as with a real geodetic application, the intra-landmark distances are known with greater precision a priori than the relative elevations. Now let's contrast how the two proposed methods for comparing landscapes handle the sorts of "spiking" discrepancies suggested by your one-dimension example, and then consider a further alternative that may address your concern. If we use a simple pointwise sum of squares as hfshaw-ga proposed initially, then the locations of points p_i are not important for the calculation. All points are equally weighted. An integral over the areas defined by the Delauney triangulation, as I proposed, weights the contributions of points by the size of the triangles which they bound, since for piecewise linear "elements" the influence of any measured h(p_i) extends no further than that. Hence in this measure of discrepancy, the appearance of two spikes at different nearby locations in the landscapes will contribute to their computed distance, but the magnitude of the contribution is limited by the granularity of the distance between them (a distance epsilon, as you described it). I can think of applications where this would be just the result desired. For example, in comparing two (one-dimensional) NMR spectra, the horizontal coordinates (magnetic frequencies) are known with high precision and form a reproducible basis for comparing two curves. (If necessary samples may be "doped" with a compound that has a known spike somewhere in the frequency range of interest to produce a synthetic "landmark" for reference purposes.) On the other hand you may have in mind a application of a different sort, in which not only the heights but also the "spatial" coordinates p_i may have inter-sampling uncertainties. We measured a spike at "x", but really (given the registration of instruments from run to run), it might have been at x plus or minus epsilon. In that case I would think about "smoothing" the data. Instead of comparing the two landscape datasets "as is", I'd "smear out" the contributions of individual heights by replacing them with nearest neighbor averages (the distance over which the averaging is done being a function of the uncertainty in spatial coordinates). In this way, depending on their proximity, two spikes that appear quite separated in their respective landscapes would have overlapping support, and this reduces their overall contribution to the computed discrepancy. In the extreme limit all heights would be averaged out, and the comparison made by differencing the mean values of h(p_i) between the two landscapes. Something similar to this is involved in image compression, where a full palette of 24-bit colors needs to be approximated by a skillful selection of an 8-bit subset. A contrary kind of algorithm is needed for edge detection/enhancement in imaging, with the goal of identifying and contrasting more sharply some perceived discontinuities in a picture. regards, mathtalk-ga
Clarification of Question by prettypolynomial-ga on 23 Nov 2004 17:35 PST Mathtalk, thank you for your thoughtful comments. Let me briefly explain where the data comes from, and I think that will help make matters clearer. I'll stay in the 1-d framework for this. Suppose we have an instrument that periodically outputs a value in [0,1]. We collect these measurements and cluster them using k-means. (For our purposes, k is a fairly large value, i.e., k>64, so one might view this clustering as creating a codebook.) The cluster prototypes are the points referred to in the original problem statement as p_1, ...,p_n. We then create a Voronoi partitioning of the space around these prototypes. For each partitioned region, we count its number of observations with respect to the measured data. This number, when normalized over all observations, is assigned as the prototype's height. In this way, we define the (indeed) metaphorical landscape in the problem statement. The positional data therefore have no privileged status over the height data, as both sets are determined by the particular (and often somewhat arbitrary) output of our k-means clustering. (One could ask here how the points, i.e., the cluster prototypes, remain the same for different landscapes, which essentially correspond to different sets of measurements, and might seem by necessity to be clustered differently. Here, I'll beg your indulgence that for reasons beyond the scope of my question, the sets of prototypes points are in fact identical.) Thus, my question is how can I measure differences between "landscapes" created in this fashion. I'll mention that I did initially consider a spreading (or "smearing") mechanism by considering each landscape as a parameterized Gaussian mixture model and iteratively increasing the influence of each Voronoi region on its neighbors. However, I was unable to find a principled means for effecting this spreading or any theoretical justification for what I was doing, which is what lead me here.
Request for Question Clarification by mathtalk-ga on 25 Nov 2004 14:04 PST Hi, prettypolynomial-ga: I applaud your search for a "principled" justification, but given my ignorance of what underlies the identification of p_i's in the two datasets, I'm reluctant to guess at an appropriate treatment. It seems to me that the Voronoi partitioning suggests a meaningful notion of distance between points and of the areas of Delaunay triangulation, but since you've considered this so thoroughly, I'm sure your doubts about their significance are well founded. regards, mathtalk-ga
Clarification of Question by prettypolynomial-ga on 25 Nov 2004 15:42 PST Hi Mathtalk, Please do me the following the favor, as I would like to reward you for the time you've spent in discussion with me. Post the original solution you worked out as the answer, which I'm interested in seeing in any event, and I'll then mark this question as answered. I really do appreciate the effort you've exerted so far, particularly given the overly abbreviated problem statement. Best regards, prettypolynomial
Request for Question Clarification by mathtalk-ga on 28 Nov 2004 16:54 PST Hi, prettypolynomial-ga: Have you studied the Comment posted by ram_nathaniel-ga? The Earth Mover's Distance (EMD) certainly sounds like the sort of thing your application requires. I looked at the linear programming definition but haven't yet reached a clear grasp of why it matches the intuitive notion. regards, mathtalk-ga
Clarification of Question by prettypolynomial-ga on 28 Nov 2004 18:46 PST Hi Mathtalk, I've now looked through several papers on the Earth Mover's Distance (EMD), which I've found extremely helpful. In particular, I read one by Levina and Bickel which demonstrates that the EMD is equivalent in many cases to something called the Mallows Distance and gives some insight into why it works. (http://www.stat.lsa.umich.edu/~elevina/EMD.pdf) As for the linear programming definition, the only thing it doesn't seem to actually guarantee is that one-to-one peak matching is preferred over one-to-many peak matching, as in Figure 1d of the original Rubner et al. paper, although that specification is somewhat informal. I also took slight issue with the notion that people have intuitive preferences for comparing histograms, which seems somewhat farfetched outside of graduate schools. People surely have strong intuitions governing image comparisons, but assuming those preferences automatically transfer to other image representations, e.g., histograms, needs to be substantiated, although I would certainly tend to believe it.

Answer

There is no answer at this time.

Comments

Subject: Re: Comparing two landscapes
From: hfshaw-ga on 15 Nov 2004 15:25 PST

The measure you are looking for is the sum over n of the squared
differences between the z coordinates (i.e., the "elevations") at each
of the x-y coordinates, divided by the sum of the two elevations:  X^2
= sum[i = 1 to n]{(h_i(set 1) - h_i(set 2))^2/(h_i(set 1) + h_i(set
2)}

X^2 will vary from zero for a "perfect" fit, to some large positive
value for poor fits.

If the differences in elevation follow a normal distribution, the
measure X^2 will follow the "chi-square" distribution with n degrees
of freedom.  One can use tables, readily available programs, or
on-line calculators to calculate the probability that the observed
match between the two surfaces is simply due to chance.

If you impose the normalization condition that for each landscape,
sum[i=1 to n] (h(p_i)) = 1, then X^2 will follow the chi-square
distribution with (n-1) degrees of freedom.

Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 15:34 PST

Stretching a "rubber sheet" over poles of varying heights is not
guaranteed to reflect the location or presence of all the poles
underneath.  Some poles could be of a height that, in the given
location, causes the rubber sheet to clear them without touching.

Perhaps a better construction would be piecewise linear triangles on a
Delaunay triangulation of the region (e.g. unit square) that bounds
the pole locations.

The problem of interpolating a surface from measured elevations is one
that has received considerable attention in the computational
geography literature.  While the piecewise linear interpolation is
fast, the "jagged" artifacts it introduces are unrealistic for the
purpose of estimating contour lines.

regards, mathtalk-ga

Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 15:51 PST

The solution proposed by hfshaw-ga, a sum of squares over all
locations, may be satisfactory for some purposes.

However such an approach does not take into account the distribution of locations.

It might be that two data sets agree closely in one region where there
are a large number of data points, but poorly in regions with a small
number.

For some applications it might be desirable to "weight" the
discrepancies according to the areas affected rather than by the
numbers of points involved.  The notion of a "rubber sheet" stretched
over sticks may be intended to effect this sort of criterion.

regards, mathtalk-ga

Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 15 Nov 2004 16:10 PST

Thank you both for responding.

hwshaf, your solution could be reasonable for comparing discrete
probability distributions, but it ignores an essential aspect of the
problem -- namely, that the points (or events) have spatial locations.
 Peaks that are near each other in the plane should render their
respective landscapes as being more rather than less similar, even if
they don't perfectly overlap.  The notion of spatial proximity is
essential to the problem.

mathtalk, yes, your comment is well taken -- the "rubber sheet"
analogy was just for visualization purposes, although I was imagining
the sheet to be tacked down at the top of every pole.  (Regardless,
this problem is entirely theoretic and has no obvious physical
analogy.)  Also, it turns out the points actually do correspond to a
Delaunay triangulation of a region, but it hasn't been clear to me how
to make use of that point.  I'll take a look at the computational
geography literature to see if I can find any relevant references.  To
tell you the truth, I've been surprised in previous literature
searches that I haven't found obvious references to this problem, but
I will certainly try again.

Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 16:23 PST

I suppose that, unless a "smooth" interpolation is important, then I
would just extend hfshaw-ga's suggestion of a discrete sum of squares
to an integral over the region of the square of the differences of the
two piecewise linear interpolants.

Note that the difference of the two piecewise linear interpolants is
just the interpolant of the height differences because the
"landscapes" share a common set of locations.  This means you can just
integrate the square of the "difference" interpolant over each
triangle.

If you like I can find an exact quadrature rule for this sort of
computation & post that as an "official" answer.

regards, mathtalk-ga

Subject: Re: Comparing two landscapes
From: pinkfreud-ga on 16 Nov 2004 18:27 PST

Prettypolynomial,

I can't answer your question, but I did want to compliment you on your
username, which is one of the cleverest screen names I've seen on
Google Answers. It reminded me of this oldie-but-goodie:

http://www.macs.hw.ac.uk/~pjbk/humour/polynomial.html

Subject: Re: Comparing two landscapes
From: bcrounse-ga on 17 Nov 2004 13:49 PST

One approach that is likely to be more complicated than necessary
would be to krige both surfaces, which would give you predicitions at
regularly spaced intervals, and then do a sum-of-squares measurement
(or some other evenly-weighted error measurement) over a regular
lattice.  However, if memory serves, kriging is best applied to
stationary data, so if your points and heights are sufficiently
weirdly distributed this probably wouldn't work very well.

But, I figured this was a chance to get some use of a spatial
statistics course that I took a few years ago, and maybe spark an idea
with someone else.

Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 22 Nov 2004 14:07 PST

Pinkfreud, you have correctly guessed the origin of my username.  :) 
By the way, I think your username is far more clever.

Bcrounse, thank you for the suggestion to examine the literature on
kriging.  I found it (and its origin) quite fascinating.  I'm not sure
it's appropriate here because it seems to assume a natural regularity
in the quantity being estimated.  For example, one doesn't expect gold
or oil to be randomly distributed and kriging capitalizes on this to
perform interpolation.  Unfortunately, my domain provides no such
assurances of regularity, as far as I am aware, so I don't see how to
use kriging to my advantage.  However, I'm still looking through the
body of literature you introduced me to, which certainly touches on
this area!

Subject: EMD - earth moving distance
From: ram_nathaniel-ga on 27 Nov 2004 00:19 PST

What you are looking for is a measurement called EMD (Earth Moving Distance)
This is a measurement that regards the landscape as an actual
landscape and measures how much work will it actually take (say with a
tracktor) to change from one landscape to the other.

While EMD is rather easy to calculate in one dimention, it is a bit
harder to calculate in 2D or more.

search the web for algorithms to implement it :-)

Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 28 Nov 2004 18:10 PST

Ram, acharon chaviv!

Thanks very much for the pointer!  That was an extremely helpful comment!

For anyone who's interested, the seminal reference is:
http://vision.stanford.edu/public/publication/rubner/rubnerTr98.pdf

I'm wondering if I can simplify the computation because (using their
terminology) the two sets of signatures are going to have the same
modes and the masses will be equivalent when normalized.  I'll have to
work it out and see, but thanks again for this!

P.S. Can someone explain to me what the etiquette is here when two
people help answer a question?

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy