|
|
Subject:
Probability that two samples were from same population
Category: Science > Math Asked by: pmd-ga List Price: $10.00 |
Posted:
07 Nov 2002 00:22 PST
Expires: 07 Dec 2002 00:22 PST Question ID: 100990 |
In an email marketing campaign, a test posting of 28000 produced 185 orders, a response rate of 0.66%. The 28000 names were selected at random from a population of 298000. A further posting to the remaining 270000 names only produced 347 orders, or a response rate of 0.12%. What is the probability that the two samples were truely independent and that the poor performance of the further posting was a purely random factor, (as opposed to some explainable factor such as the test posting was not randomly selected but was the best performing or most recently active names) Please show your workings and a brief summary of your expertise. |
|
Subject:
Re: Probability of 2 Sample
Answered By: omnivorous-ga on 07 Nov 2002 09:45 PST |
Let's take the first sample: n = 28,000 p = 185/28,000 = .66% (.0066) The standard deviation for p is SD = SQR ROOT [(p*(1-p)/n] = SQR ROOT [(.0066*.9934)/28,000)] = .0005 To get a 99% confidence interval that the population is representative, you'll use 3 standard deviations for a random sample: .51% < .66% < .81% If you conduct 100 more surveys of that population, only once should you expect the results to be outside that range. The second sample is over a hundred SDs away. This graduate statistics lesson was reviewed with Richard Brown's text, "Advanced Mathematics." The Google search strategy that's likely to bring up the best mathematics is: "standard deviation" + polling RobertNiles.com has a good indication how SDs are used in this "Margin of Error" description (undated): http://www.robertniles.com/stats/margin.shtml I hope this helps with the supplier. Best regards, Omnivorous-GA MBA, University of Chicago | |
| |
|
|
Subject:
Re: Probability that two samples were from same population
From: secret901-ga on 07 Nov 2002 00:39 PST |
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011916631703 |
Subject:
Re: Probability that two samples were from same population
From: rbnn-ga on 07 Nov 2002 04:47 PST |
secret901-ga: Thanks for the comment, but how did you arrive at this value? I concur with you the probability seems like it would be quite low. This is actually a tricky question - I thought about it for a while, I'm not sure exactly how to phrase it precisely as a statistics problem, much less how to solve it. |
Subject:
Re: Probability that two samples were from same population
From: neilzero-ga on 07 Nov 2002 06:38 PST |
Typically there is a dominent factor which explains what happened in statistics, but several factors may be following close on the heals of dominent. In your case the TV news may have made people more aware that most mailings are deceptive or worse 2 the need for the product or service may have decreased, due to TV news, or acompeteing ad (or lack of it relating to the goods or service) the mailings may have reached potential customers on a different day of the week 3 the stock market did poorly 4 lots more possibilities. Numbers can be put on all of these for statistical analysis, but the process wanders far from reality. I agree; that large a drop indicates the test mailing was favorably riged, perhaps deliberately, but I don't think you can prove that with statistics. Evidence, perhaps, but not proof. Neil |
Subject:
Re: Probability that two samples were from same population
From: pmd-ga on 07 Nov 2002 07:18 PST |
Thanks for your comments neilzero. The full mailout occurred two days after the test, and both were completed on weekdays. The mail out was to individuals who are members of a loyalty scheme where they expect to receive offers by mail. Our product is of general appeal and the demand is not likely to be affected by news articles. Given our previous experience of success vs. days of the week, I do not believe that either the day of the week or a macro factor could have so profoundly affected the result. I do accept that the factors you mention could have caused a variation of perhaps several % against the test, but the actual result was much more dramatic. rbnn: I believe the solution is to "draw" a probability distribution of the expected results based on the test. I believe this is a binomial distribution, but cannot lay my hands on my old text books at the moment. The probability curve would show the probability of each of the possible results from the "further posting" from 0 to 270000 orders. At each end of the distribution, the probability would tail off to almost zero, and there would be a peak at the point 1782 orders (0.66%x270000) which was the most likely outcome. The area of this curve to the left of the 347 orders point would be the probability I am looking for. (This is all from memory, and I would welcome discussion on where I am wrong!) |
Subject:
Re: Probability that two samples were from same population
From: nellie_bly-ga on 07 Nov 2002 07:54 PST |
I'd have to dig for the sampling tests to prove the point. (I'm out of practice.) But in practical terms, your second mailing "proved" that the first allegedly random sample was somehow skewed. The differences are off the wall. You'd need to know precisely how the first sample was drawn to find the problem. It obviously wasn't a true every nth draw across the entire population. |
Subject:
Re: Probability that two samples were from same population
From: pmd-ga on 07 Nov 2002 09:20 PST |
nellie_bly: The supplier of the email list is absolutely adament that the test sample was drawn randomly. To me (and anyone else who has looked at this) it is obvious that this was not the case. My object is to "prove" statisticly how very unlikely it is that what they have said is correct. I believe either the sample was not random, or the email to the non-test population did not occur properly. (The email was transmitted by the supplier) |
Subject:
Re: Probability that two samples were from same population
From: mathtalk-ga on 07 Nov 2002 14:09 PST |
The factor(s) that have skewed the results are not necessarily related to the sampling procedure. Let's consider, for example, a mailing done one month before 9/11 versus a mailing done at the height of the anthrax scare. -- mathtalk-ga |
Subject:
Re: Probability that two samples were from same population
From: mathtalk-ga on 07 Nov 2002 14:18 PST |
Any idea of how many "duplications" might be present in the total emailing list? It strikes me the that sort of pattern observed might be consistent with a list that contains a large ratio of duplications. Up to a point the repetition of the "advertising" produces additional responses, but the point of diminishing returns might have been passed already with the test mailing. regards, mathtalk-ga |
Subject:
Re: Probability that two samples were from same population
From: probonopublico-ga on 07 Nov 2002 23:29 PST |
Some years ago, I met some business consultants who ran courses/seminars. They were baffled by the variability of the responses to their marketing efforts: sometimes they would be overwhelmed; at other times, the response was disappointing ... even though they faithfully duplicated whatever had previously worked. I doubt if statistical analysis will help! |
Subject:
Re: Probability that two samples were from same population
From: drdavid-ga on 11 Nov 2002 10:51 PST |
I'm going to take my own pass at constructing a rigorous answer to your question. The qualitative conclusion remains essentially similar, but the numerical results differ from those above. The problem, of course, is to figure out the correct way to translate the problem into a mathematical exercise. Further, I have to assume that the statement of the problem is correct. For example, if you were to assume that the 28,000- and 270,000-person samples were both subsets of some larger population (say a million or more), then the calculation would be slightly different again, but we will assume here that the problem universe contains exactly 298,000 people. Thus we also know the exact probability of a response, namely, (185+347) / 298,000 = 0.18%. Now, we can ask the question (for both samples): "what is the likelihood that a given sample was taken randomly from the entire population?" In other words, we will test the hypothesis that the two samples were randomly drawn from the entire population AND that the two campaigns were conducted under identical conditions. It is perhaps a little more convenient to recast the formula used by Omnivorous-ga for standard deviation in terms of absolute numbers rather than probabilities. Then we have: Std. Dev. = sqrt [N * p (1-p)] where N is the size of a given sample and p is the known probability (0.001785). (Note that since (1-p) is very nearly one in this case, and N * p is just the expected number of returns, that the standard deviation is essentially the square root of the expected number of returns.) For the 28,000-person sample, the expected number of returns is 49.99 with a standard deviation of 7.06. The actual result of 185 returns is 19 standard deviations off. The likelihood of this occurring due to chance alone is vanishingly small. For the 270,000-person sample, the expected number of returns is 482.0 with a standard deviation of 21.9. The actual result of 347 returns is 6 standard deviations off. This is likely to occur about once in a billion trials. (see, for example, "Standard Normal Distribution Table to 7.5 S.D.," available at http://www.adamssixsigma.com/Newsletters/standard_normal_table.htm ) Thus it is safe to conclude that one or the other (or both) of these campaigns was not done in the desired manner. The problem that now remains is to try to identify the source of the bias in one or both of these campaigns. It could have been caused by problems in the selection of the initial 28,000 people, or it could have been a change in the conditions of the campaign. It may have been due to either deliberate misrepresentation or to some uncontrolled-for factor in the sampling process. --drdavid-ga (PhD from MIT, extensive experience using and teaching probability) |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |