Google Answers Logo
View Question
 
Q: Comparing two landscapes ( No Answer,   10 Comments )
Question  
Subject: Comparing two landscapes
Category: Science > Math
Asked by: prettypolynomial-ga
List Price: $25.00
Posted: 14 Nov 2004 17:03 PST
Expires: 14 Dec 2004 17:03 PST
Question ID: 428963
Consider a set of n points that all lie in a plane.  Call them
individually p_1, p_2, ?, p_n, and for the sake of simplicity, let's
assume they all fall in the unit square.

Now suppose that we assign to each point a height, h(p_i), so that the
points and their heights together define a type of landscape.  One way
to imagine this landscape is to say that it is defined by a rubber
sheet stretched over sticks perpendicular to the plane, where each
stick corresponds to one of the points at a particular height.

We stipulate that the height of each point p_i is between zero and
one, i.e., 0<=h(p_i) <=1

Now, consider two different landscapes L_1 and L_2 over the same set
of n points, i.e., each landscape assigns (most likely) different
heights to the points p_1, ? p_n.

How should I define a metric that measures the difference between two
different landscapes, based on their heights and the positions of the
given points?  Informally, I think that I'm looking for a measure of
how well the two landscapes would "fit" if you tried to push them
together.  I realize there are many different possible measures, from
the utterly trivial to an actual physical simulation, but I'm looking
for something in between that simply makes sense.

So in summary, I want a function d(L_1, L_2, p_1, ?, p_n) that
provides this measure.

Also, what, if anything, is changed if we further stipulate that for
any landscape, the sum of all the point heights = 1, namely:

For each landscape:
sum[i=1 to n] (h(p_i)) = 1

(In this case, one could assume the landscapes correspond to
probability distributions defined over the points.)

Request for Question Clarification by mathtalk-ga on 16 Nov 2004 18:20 PST
Hi, prettypolynomial:

I worked out a fairly simple formula for integrating the square of
differences between two "landscapes" over each triangle (of the
underlying Delaunay triangulation).  I can present that as a solution,
together with some remarks about the restricted case of the landscape
points summing to 1, if you think it might be satisfactory.

regards, mathtalk-ga

Clarification of Question by prettypolynomial-ga on 22 Nov 2004 13:47 PST
Mathtalk, apologies for the delay in responding.

May I ask you a question first about the solution you've formulated? 
I'd like to know if it addresses an issue I'm concerned about.

To make this easier, let's consider for the moment a 1-dimensional
version of this problem along the unit interval.  Say we have a set of
points along a line.  We define a landscape just as above, but now a
landscape is a 2-d rather than 3-d object.

I'm concerned about cases like the following.  Suppose we examine a
small subset of this interval and have:

Landscape 1:

               |
               | 
               |
 __|___|___|___|___
x                  x+epsilon

Landscape 2:

   |
   |
   |
 __|___|___|___|___
x                 x+epsilon

If we perform a type of averaging by considering Delaunay
triangulations rather than the points themselves, the fact that the
two peaks are near each other in the interval -- but may have an
arbitrary number (here, I've used two) of intervening points in our
set -- seems to get lost.

Does your solution address this?  Can you give me a brief idea of how?

Clarification of Question by prettypolynomial-ga on 22 Nov 2004 13:50 PST
It just occurred to me that I should have used at least 3 intervening
points to make my concern clearer, and with that addendum, I look
forward to your response.

Request for Question Clarification by mathtalk-ga on 22 Nov 2004 19:35 PST
Hi, prettypolynomial-ga:

I think the device of using a one-dimensional problem to discuss your
concerns is a good one, but I'm not yet clear on what consideration
you wish to give for "spikes" that are at nearby but distinct points
of the "landscape".

To back up a minute, the nature of your application is of course the
most important source of criteria for what makes a good measure of
distance between "landscapes".  I've assumed that referring to a
"landscape" is a bit metaphorical, it has some significance for how
best to measure any discrepancy.

In particular you've noted that the underlying points p_i for both
landscapes are precisely the same, and that these form a Delauney
triangulation of a region.  These aspects indicate to me that, as with
a real geodetic application, the intra-landmark distances are known
with greater precision a priori than the relative elevations.

Now let's contrast how the two proposed methods for comparing
landscapes handle the sorts of "spiking" discrepancies suggested by
your one-dimension example, and then consider a further alternative
that may address your concern.

If we use a simple pointwise sum of squares as hfshaw-ga proposed
initially, then the locations of points p_i are not important for the
calculation.  All points are equally weighted.

An integral over the areas defined by the Delauney triangulation, as I
proposed, weights the contributions of points by the size of the
triangles which they bound, since for piecewise linear "elements" the
influence of any measured h(p_i) extends no further than that.

Hence in this measure of discrepancy, the appearance of two spikes at
different nearby locations in the landscapes will contribute to their
computed distance, but the magnitude of the contribution is limited by
the granularity of the distance between them (a distance epsilon, as
you described it).

I can think of applications where this would be just the result
desired.  For example, in comparing two (one-dimensional) NMR spectra,
the horizontal coordinates (magnetic frequencies) are known with high
precision and form a reproducible basis for comparing two curves.  (If
necessary samples may be "doped" with a compound that has a known
spike somewhere in the frequency range of interest to produce a
synthetic "landmark" for reference purposes.)

On the other hand you may have in mind a application of a different
sort, in which not only the heights but also the "spatial" coordinates
p_i may have inter-sampling uncertainties.  We measured a spike at
"x", but really (given the registration of instruments from run to
run), it might have been at x plus or minus epsilon.

In that case I would think about "smoothing" the data.  Instead of
comparing the two landscape datasets "as is", I'd "smear out" the
contributions of individual heights by replacing them with nearest
neighbor averages (the distance over which the averaging is done being
a function of the uncertainty in spatial coordinates).  In this way,
depending on their proximity, two spikes that appear quite separated
in their respective landscapes would have overlapping support, and
this reduces their overall contribution to the computed discrepancy. 
In the extreme limit all heights would be averaged out, and the
comparison made by differencing the mean values of h(p_i) between the
two landscapes.

Something similar to this is involved in image compression, where a
full palette of 24-bit colors needs to be approximated by a skillful
selection of an 8-bit subset.  A contrary kind of algorithm is needed
for edge detection/enhancement in imaging, with the goal of
identifying and contrasting more sharply some perceived
discontinuities in a picture.

regards, mathtalk-ga

Clarification of Question by prettypolynomial-ga on 23 Nov 2004 17:35 PST
Mathtalk, thank you for your thoughtful comments.  

Let me briefly explain where the data comes from, and I think that
will help make matters clearer.  I'll stay in the 1-d framework for
this.

Suppose we have an instrument that periodically outputs a value in
[0,1].  We collect these measurements and cluster them using k-means. 
(For our purposes, k is a fairly large value, i.e., k>64, so one might
view this clustering as creating a codebook.)  The cluster prototypes
are the points referred to in the original problem statement as p_1,
...,p_n.

We then create a Voronoi partitioning of the space around these
prototypes.  For each partitioned region, we count its number of
observations with respect to the measured data.  This number, when
normalized over all observations, is assigned as the prototype's
height.  In this way, we define the (indeed) metaphorical landscape in
the problem statement.

The positional data therefore have no privileged status over the
height data, as  both sets are determined by the particular (and often
somewhat arbitrary) output of our k-means clustering.

(One could ask here how the points, i.e., the cluster prototypes,
remain the same for different landscapes, which essentially correspond
to different sets of measurements, and might seem by necessity to be
clustered differently.  Here, I'll beg your indulgence that for
reasons beyond the scope of my question, the sets of prototypes points
are in fact identical.)

Thus, my question is how can I measure differences between
"landscapes" created in this fashion.  I'll mention that I did
initially consider a spreading (or "smearing") mechanism by
considering each landscape as a parameterized Gaussian mixture model
and iteratively increasing the influence of each Voronoi region on its
neighbors.  However, I was unable to find a principled means for
effecting this spreading or any theoretical justification for what I
was doing, which is what lead me here.

Request for Question Clarification by mathtalk-ga on 25 Nov 2004 14:04 PST
Hi, prettypolynomial-ga:

I applaud your search for a "principled" justification, but given my
ignorance of what underlies the identification of p_i's in the two
datasets, I'm reluctant to guess at an appropriate treatment.

It seems to me that the Voronoi partitioning suggests a meaningful
notion of distance between points and of the areas of Delaunay
triangulation, but since you've considered this so thoroughly, I'm
sure your doubts about their significance are well founded.

regards, mathtalk-ga

Clarification of Question by prettypolynomial-ga on 25 Nov 2004 15:42 PST
Hi Mathtalk,

Please do me the following the favor, as I would like to reward you
for the time you've spent in discussion with me.  Post the original
solution you worked out as the answer, which I'm interested in seeing
in any event, and I'll then mark this question as answered.

I really do appreciate the effort you've exerted so far, particularly
given the overly abbreviated problem statement.

Best regards,
prettypolynomial

Request for Question Clarification by mathtalk-ga on 28 Nov 2004 16:54 PST
Hi, prettypolynomial-ga:

Have you studied the Comment posted by ram_nathaniel-ga?  The Earth
Mover's Distance (EMD) certainly sounds like the sort of thing your
application requires.  I looked at the linear programming definition
but haven't yet reached a clear grasp of why it matches the intuitive
notion.

regards, mathtalk-ga

Clarification of Question by prettypolynomial-ga on 28 Nov 2004 18:46 PST
Hi Mathtalk,

I've now looked through several papers on the Earth Mover's Distance
(EMD), which I've found extremely helpful.  In particular, I read one
by Levina and Bickel which demonstrates that the EMD is equivalent in
many cases to something called the Mallows Distance and gives some
insight into why it works. 
(http://www.stat.lsa.umich.edu/~elevina/EMD.pdf)

As for the linear programming definition, the only thing it doesn't
seem to actually guarantee is that one-to-one peak matching is
preferred over one-to-many peak matching, as in Figure 1d of the
original Rubner et al. paper, although that specification is somewhat
informal.

I also took slight issue with the notion that people have intuitive
preferences for comparing histograms, which seems somewhat farfetched
outside of graduate schools.  People surely have strong intuitions
governing image comparisons, but assuming those preferences
automatically transfer to other image representations, e.g.,
histograms, needs to be substantiated, although I would certainly tend
to believe it.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Comparing two landscapes
From: hfshaw-ga on 15 Nov 2004 15:25 PST
 
The measure you are looking for is the sum over n of the squared
differences between the z coordinates (i.e., the "elevations") at each
of the x-y coordinates, divided by the sum of the two elevations:  X^2
= sum[i = 1 to n]{(h_i(set 1) - h_i(set 2))^2/(h_i(set 1) + h_i(set
2)}

X^2 will vary from zero for a "perfect" fit, to some large positive
value for poor fits.

If the differences in elevation follow a normal distribution, the
measure X^2 will follow the "chi-square" distribution with n degrees
of freedom.  One can use tables, readily available programs, or
on-line calculators to calculate the probability that the observed
match between the two surfaces is simply due to chance.

If you impose the normalization condition that for each landscape,
sum[i=1 to n] (h(p_i)) = 1, then X^2 will follow the chi-square
distribution with (n-1) degrees of freedom.
Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 15:34 PST
 
Stretching a "rubber sheet" over poles of varying heights is not
guaranteed to reflect the location or presence of all the poles
underneath.  Some poles could be of a height that, in the given
location, causes the rubber sheet to clear them without touching.

Perhaps a better construction would be piecewise linear triangles on a
Delaunay triangulation of the region (e.g. unit square) that bounds
the pole locations.

The problem of interpolating a surface from measured elevations is one
that has received considerable attention in the computational
geography literature.  While the piecewise linear interpolation is
fast, the "jagged" artifacts it introduces are unrealistic for the
purpose of estimating contour lines.

regards, mathtalk-ga
Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 15:51 PST
 
The solution proposed by hfshaw-ga, a sum of squares over all
locations, may be satisfactory for some purposes.

However such an approach does not take into account the distribution of locations.

It might be that two data sets agree closely in one region where there
are a large number of data points, but poorly in regions with a small
number.

For some applications it might be desirable to "weight" the
discrepancies according to the areas affected rather than by the
numbers of points involved.  The notion of a "rubber sheet" stretched
over sticks may be intended to effect this sort of criterion.

regards, mathtalk-ga
Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 15 Nov 2004 16:10 PST
 
Thank you both for responding.

hwshaf, your solution could be reasonable for comparing discrete
probability distributions, but it ignores an essential aspect of the
problem -- namely, that the points (or events) have spatial locations.
 Peaks that are near each other in the plane should render their
respective landscapes as being more rather than less similar, even if
they don't perfectly overlap.  The notion of spatial proximity is
essential to the problem.

mathtalk, yes, your comment is well taken -- the "rubber sheet"
analogy was just for visualization purposes, although I was imagining
the sheet to be tacked down at the top of every pole.  (Regardless,
this problem is entirely theoretic and has no obvious physical
analogy.)  Also, it turns out the points actually do correspond to a
Delaunay triangulation of a region, but it hasn't been clear to me how
to make use of that point.  I'll take a look at the computational
geography literature to see if I can find any relevant references.  To
tell you the truth, I've been surprised in previous literature
searches that I haven't found obvious references to this problem, but
I will certainly try again.
Subject: Re: Comparing two landscapes
From: mathtalk-ga on 15 Nov 2004 16:23 PST
 
I suppose that, unless a "smooth" interpolation is important, then I
would just extend hfshaw-ga's suggestion of a discrete sum of squares
to an integral over the region of the square of the differences of the
two piecewise linear interpolants.

Note that the difference of the two piecewise linear interpolants is
just the interpolant of the height differences because the
"landscapes" share a common set of locations.  This means you can just
integrate the square of the "difference" interpolant over each
triangle.

If you like I can find an exact quadrature rule for this sort of
computation & post that as an "official" answer.

regards, mathtalk-ga
Subject: Re: Comparing two landscapes
From: pinkfreud-ga on 16 Nov 2004 18:27 PST
 
Prettypolynomial,

I can't answer your question, but I did want to compliment you on your
username, which is one of the cleverest screen names I've seen on
Google Answers. It reminded me of this oldie-but-goodie:

http://www.macs.hw.ac.uk/~pjbk/humour/polynomial.html
Subject: Re: Comparing two landscapes
From: bcrounse-ga on 17 Nov 2004 13:49 PST
 
One approach that is likely to be more complicated than necessary
would be to krige both surfaces, which would give you predicitions at
regularly spaced intervals, and then do a sum-of-squares measurement
(or some other evenly-weighted error measurement) over a regular
lattice.  However, if memory serves, kriging is best applied to
stationary data, so if your points and heights are sufficiently
weirdly distributed this probably wouldn't work very well.

But, I figured this was a chance to get some use of a spatial
statistics course that I took a few years ago, and maybe spark an idea
with someone else.
Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 22 Nov 2004 14:07 PST
 
Pinkfreud, you have correctly guessed the origin of my username.  :) 
By the way, I think your username is far more clever.

Bcrounse, thank you for the suggestion to examine the literature on
kriging.  I found it (and its origin) quite fascinating.  I'm not sure
it's appropriate here because it seems to assume a natural regularity
in the quantity being estimated.  For example, one doesn't expect gold
or oil to be randomly distributed and kriging capitalizes on this to
perform interpolation.  Unfortunately, my domain provides no such
assurances of regularity, as far as I am aware, so I don't see how to
use kriging to my advantage.  However, I'm still looking through the
body of literature you introduced me to, which certainly touches on
this area!
Subject: EMD - earth moving distance
From: ram_nathaniel-ga on 27 Nov 2004 00:19 PST
 
What you are looking for is a measurement called EMD (Earth Moving Distance)
This is a measurement that regards the landscape as an actual
landscape and measures how much work will it actually take (say with a
tracktor) to change from one landscape to the other.

While EMD is rather easy to calculate in one dimention, it is a bit
harder to calculate in 2D or more.

search the web for algorithms to implement it :-)
Subject: Re: Comparing two landscapes
From: prettypolynomial-ga on 28 Nov 2004 18:10 PST
 
Ram, acharon chaviv!

Thanks very much for the pointer!  That was an extremely helpful comment!

For anyone who's interested, the seminal reference is:
http://vision.stanford.edu/public/publication/rubner/rubnerTr98.pdf

I'm wondering if I can simplify the computation because (using their
terminology) the two sets of signatures are going to have the same
modes and the masses will be equivalent when normalized.  I'll have to
work it out and see, but thanks again for this!

P.S. Can someone explain to me what the etiquette is here when two
people help answer a question?

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy