Google Answers Logo
View Question
 
Q: Probability that two samples were from same population ( Answered,   10 Comments )
Question  
Subject: Probability that two samples were from same population
Category: Science > Math
Asked by: pmd-ga
List Price: $10.00
Posted: 07 Nov 2002 00:22 PST
Expires: 07 Dec 2002 00:22 PST
Question ID: 100990
In an email marketing campaign, a test posting of 28000 produced 185
orders, a response rate of 0.66%.  The 28000 names were selected at
random from a population of 298000.  A further posting to the
remaining 270000 names only produced 347 orders, or a response rate of
0.12%.

What is the probability that the two samples were truely independent
and that the poor performance of the further posting was a purely
random factor, (as opposed to some explainable factor such as the test
posting was not randomly selected but was the best performing or most
recently active names)

Please show your workings and a brief summary of your expertise.
Answer  
Subject: Re: Probability of 2 Sample
Answered By: omnivorous-ga on 07 Nov 2002 09:45 PST
 
Let's take the first sample:

n = 28,000
p = 185/28,000 = .66% (.0066)

The standard deviation for p is SD = SQR ROOT [(p*(1-p)/n] = SQR ROOT
[(.0066*.9934)/28,000)] = .0005

To get a 99% confidence interval that the population is
representative, you'll use 3 standard deviations for a random sample:

.51% < .66% < .81%

If you conduct 100 more surveys of that population, only once should
you expect the results to be outside that range.  The second sample is
over a hundred SDs away.

This graduate statistics lesson was reviewed with Richard Brown's
text, "Advanced Mathematics."  The Google search strategy that's
likely to bring up the best mathematics is:
"standard deviation" + polling

RobertNiles.com has a good indication how SDs are used in this "Margin
of Error" description (undated):
http://www.robertniles.com/stats/margin.shtml

I hope this helps with the supplier.

Best regards,

Omnivorous-GA
MBA, University of Chicago

Request for Answer Clarification by pmd-ga on 08 Nov 2002 00:51 PST
Thanks Omniverous.
Please could you extend your calculations to provide the actual answer
I am looking for - i.e. the probability (however unlikely) that they
were from the same population.  By your calculations, I think the
second sample was about 11 SDs away from the test results... What
confidence interval does that suggest?

Clarification of Answer by omnivorous-ga on 08 Nov 2002 18:01 PST
PMD --

The standard deviation (SD) of the first same is .005, making the
0.12% respons rate 108 SDs away from the first mailing.  I want to
check some statistics references on chi-square tests before answering
your confidence interval question.

Best regards,

Omnivorous-GA
Comments  
Subject: Re: Probability that two samples were from same population
From: secret901-ga on 07 Nov 2002 00:39 PST
 
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011916631703
Subject: Re: Probability that two samples were from same population
From: rbnn-ga on 07 Nov 2002 04:47 PST
 
secret901-ga: Thanks for the comment, but how did you arrive at this
value? I concur with you the probability seems like it would be quite
low.

This is actually a tricky question - I thought about it for a while,
I'm not sure exactly how to phrase it precisely as a statistics
problem, much less how to solve it.
Subject: Re: Probability that two samples were from same population
From: neilzero-ga on 07 Nov 2002 06:38 PST
 
Typically there is a dominent factor which explains what happened in
statistics, but several factors may be following close on the heals of
dominent. In your case the TV news may have made people more aware
that most mailings are deceptive or worse 2 the need for the product
or service may have decreased, due to TV news, or acompeteing ad (or
lack of it relating to the goods or service) the mailings may have
reached potential customers on a different day of the week 3 the stock
market did poorly 4 lots more possibilities. Numbers can be put on all
of these for statistical analysis, but the process wanders far from
reality. I agree; that large a drop indicates the test mailing was
favorably riged, perhaps deliberately, but I don't think you can prove
that with statistics.  Evidence, perhaps, but not proof.  Neil
Subject: Re: Probability that two samples were from same population
From: pmd-ga on 07 Nov 2002 07:18 PST
 
Thanks for your comments neilzero.  The full mailout occurred two days
after the test, and both were completed on weekdays.  The mail out was
to individuals who are members of a loyalty scheme where they expect
to receive offers by mail.  Our product is of general appeal and the
demand is not likely to be affected by news articles.  Given our
previous experience of success vs. days of the week, I do not believe
that either the day of the week or a macro factor could have so
profoundly affected the result.  I do accept that the factors you
mention could have caused a variation of perhaps several % against the
test, but the actual result was much more dramatic.
rbnn: I believe the solution is to "draw" a probability distribution
of the expected results based on the test.  I believe this is a
binomial distribution, but cannot lay my hands on my old text books at
the moment.  The probability curve would show the probability of each
of the possible results from the "further posting" from 0 to 270000
orders.  At each end of the distribution, the probability would tail
off to almost zero, and there would be a peak at the point 1782 orders
(0.66%x270000) which was the most likely outcome.  The area of this
curve to the left of the 347 orders point would be the probability I
am looking for.  (This is all from memory, and I would welcome
discussion on where I am wrong!)
Subject: Re: Probability that two samples were from same population
From: nellie_bly-ga on 07 Nov 2002 07:54 PST
 
I'd have to dig for the sampling tests to prove the point. (I'm out of
practice.) But in practical terms, your second mailing "proved" that
the first allegedly random sample was somehow skewed.  The differences
are off the wall.

You'd need to know precisely how the first sample was drawn to find
the problem.
It obviously wasn't a true every nth draw across the entire
population.
Subject: Re: Probability that two samples were from same population
From: pmd-ga on 07 Nov 2002 09:20 PST
 
nellie_bly: The supplier of the email list is absolutely adament that
the test sample was drawn randomly.  To me (and anyone else who has
looked at this) it is obvious that this was not the case.  My object
is to "prove" statisticly how very unlikely it is that what they have
said is correct.  I believe either the sample was not random, or the
email to the non-test population did not occur properly.  (The email
was transmitted by the supplier)
Subject: Re: Probability that two samples were from same population
From: mathtalk-ga on 07 Nov 2002 14:09 PST
 
The factor(s) that have skewed the results are not necessarily related
to the sampling procedure.  Let's consider, for example, a mailing
done one month before 9/11 versus a mailing done at the height of the
anthrax scare.

-- mathtalk-ga
Subject: Re: Probability that two samples were from same population
From: mathtalk-ga on 07 Nov 2002 14:18 PST
 
Any idea of how many "duplications" might be present in the total
emailing list?  It strikes me the that sort of pattern observed might
be consistent with a list that contains a large ratio of duplications.
 Up to a point the repetition of the "advertising" produces additional
responses, but the point of diminishing returns might have been passed
already with the test mailing.

regards, mathtalk-ga
Subject: Re: Probability that two samples were from same population
From: probonopublico-ga on 07 Nov 2002 23:29 PST
 
Some years ago, I met some business consultants who ran
courses/seminars.

They were baffled by the variability of the responses to their
marketing efforts: sometimes they would be overwhelmed; at other
times, the response was disappointing ... even though they faithfully
duplicated whatever had previously worked.

I doubt if statistical analysis will help!
Subject: Re: Probability that two samples were from same population
From: drdavid-ga on 11 Nov 2002 10:51 PST
 
I'm going to take my own pass at constructing a rigorous answer to
your question. The qualitative conclusion remains essentially similar,
but the numerical results differ from those above. The problem, of
course, is to figure out the correct way to translate the problem into
a mathematical exercise. Further, I have to assume that the statement
of the problem is correct. For example, if you were to assume that the
28,000- and 270,000-person samples were both subsets of some larger
population (say a million or more), then the calculation would be
slightly different again, but we will assume here that the problem
universe contains exactly 298,000 people. Thus we also know the exact
probability of a response, namely, (185+347) / 298,000 = 0.18%. Now,
we can ask the question (for both samples): "what is the likelihood
that a given sample was taken randomly from the entire population?" In
other words, we will test the hypothesis that the two samples were
randomly drawn from the entire population AND that the two campaigns
were conducted under identical conditions.

It is perhaps a little more convenient to recast the formula used by
Omnivorous-ga for standard deviation in terms of absolute numbers
rather than probabilities. Then we have:

Std. Dev. = sqrt [N * p (1-p)]

where N is the size of a given sample and p is the known probability
(0.001785). (Note that since (1-p) is very nearly one in this case,
and N * p is just the expected number of returns, that the standard
deviation is essentially the square root of the expected number of
returns.)

For the 28,000-person sample, the expected number of returns is 49.99
with a standard deviation of 7.06. The actual result of 185 returns is
19 standard deviations off. The likelihood of this occurring due to
chance alone is vanishingly small. For the 270,000-person sample, the
expected number of returns is 482.0 with a standard deviation of 21.9.
The actual result of 347 returns is 6 standard deviations off. This is
likely to occur about once in a billion trials. (see, for example,
"Standard Normal Distribution Table to 7.5 S.D.," available at
http://www.adamssixsigma.com/Newsletters/standard_normal_table.htm )
Thus it is safe to conclude that one or the other (or both) of these
campaigns was not done in the desired manner.

The problem that now remains is to try to identify the source of the
bias in one or both of these campaigns. It could have been caused by
problems in the selection of the initial 28,000 people, or it could
have been a change in the conditions of the campaign. It may have been
due to either deliberate misrepresentation or to some uncontrolled-for
factor in the sampling process.
 
--drdavid-ga  (PhD from MIT, extensive experience using and teaching
probability)

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy