|
|
Subject:
Statistics: Hypothesis Testing needed; rigged dice.
Category: Science > Math Asked by: donphiltrodt-ga List Price: $11.00 |
Posted:
16 Apr 2003 04:28 PDT
Expires: 16 May 2003 04:28 PDT Question ID: 191123 |
What if I somehow rigged a theoretical die so that two numbers never showed. How many times would I have to roll that die to ensure that my die was successfully rigged? What if I rigged a pair of die? How many times would I have to roll the pair to ensure that my cheating worked, and I wasn't just seeing a "streak"? Note to Google editors: this is a simplification of my previous statistical question, not an attempt to swindle. Cancelleth not mine question. | |
| |
|
|
Subject:
Re: Statistics: Hypothesis Testing needed; rigged dice.
Answered By: mathtalk-ga on 23 Apr 2003 21:56 PDT Rated: |
Hi, donphiltrodt-ga: Executive Summary ================= Let p be the probability that a "cheat" works, preventing two of the six equal faces of a die from appearing. We can't directly "observe" p, but in counting how many "forbidden faces" M appear in any sample of N throws, the observed ratio M/N is on average expected to be: q = (1 - p)/3 because forbidden faces appear with chance 1/3 when the cheat fails. As p varies from 1 down to 0, q varies respectively from 0 up to 1/3. Conversely if we solve for p: p = 1 - 3q Our assignment is to explain the relationship between the number of trials N and the accuracy with which q and thus p can be estimated. What is a "statistic"? A statistic is a number which summarizes a set of data. One of the most useful (and common) statistics is the mean (average). In a binomial distribution it is usual to label one outcome as 0 and the as 1. Then the mean of the distribution is also the probability of the latter outcome. In our case we'll consider "forbidden face" throws as being the outcomes with the value 1, so that q is also the mean of the binomial distribution. A binomial distribution is completely characterized by just this one parameter q as the mean. For example the mean q also tells us the variance of the binomial distribution: s^2 = q(1-q) In practical terms one can only "sample" a binomially distributed population through some finite subpopulations, as we do here by taking N consecutive throws. The ratio M/N already discussed is then a "sample mean" statistic, i.e. the mean of the sample (finite subpopulation). The statistic M/N is an effective way to estimate the parameter q. In such a role as this statistics are called "point estimates". Note that in forming the "point estimate" of q: q ~ M/N we would also have the corresponding point estimate of p: p ~ 1 - (3M/N) How good is this estimate? That is the critical question. In itself a point estimate discloses nothing about the accuracy of its approximation. While it may be helpful to know something of the average error of approximation, this doesn't directly address a need to assess the accuracy of an estimate made from a single sample. A classic tool for such analysis is "confidence intervals". A pair of values [a,b], able to bracket a range of possibilities for p or q, is more adequate than one number to describe a "likely" truth about a parameter estimate. A third number, the "level of confidence" c, is also associated with a confidence interval. The level of confidence is the probability, for a known population distribution, that a random sample (x1,...,xN) produces an interval that contains the parameter being estimated. There are many valid recipes for confidence intervals, and I have been struggling to come up with an accessible yet mathematically sound approach to explaining the basic ones. I think I'll have to punt on this though, because the discussion quickly gets into a lot of theory of special distributions (normal, chi-square, Student's t, and Fisher's f are all relevant). Instead let me point you to a couple of Web pages that have Excel implementations of some of these calculations. [Confidence limits] http://www.quantdec.com/envstats/notes/class_08/confidence.htm [Exact Binomial and Poisson Confidence Intervals] http://members.aol.com/johnp71/confint.html The first of these links you to a download of an Excel spreadsheet that allows one to plug in N and a confidence level and get the corresponding confidence levels for various M. This spreadsheet uses Excel's built-in functions, which are not all that adequate for very large N. The second site has an online calculator, but also allows you to download an Excel spreadsheet that implements the essential functions more carefully in VBA "macros" (code). The author claims to have tested these for accuracy with very large N. More than you probably wanted to know ===================================== If you decide you want to read up on the mathematics behind these various approaches, I'd suggest M.G. Bulmer's "Principles of Statistics", available as an inexpensive Dover edition. Chapter 10 on Statistical Inference is where all the conflicting opinions over confidence intervals are fairly compared. The "classical" approach to confidence intervals has a characteristic that, for a fixed confidence level, the width of the confidence interval is roughly proportional to 1/sqrt(N). That means to narrow the "precision" of estimating p or q by a factor of 10 would require increasing the number of throws by a factor of 100. Obviously if high precision estimates were required, this approach would be quite frustrating. Better results can be obtained by making some reasonably strong assumptions about the "a priori" distribution of the parameter p or q, i.e. what values they are likely to have before any throws are made. Then one can apply the Bayesian approach to confidence intervals. I will sketch the calculations in a simple case, where we assume the cheat either never works or it always works. A conditional probability has the form: Pr(A|B = Pr( event A occurs, given that event B occurs ) In other words, with a priori knowledge of B, what is the chance of A happening. In elementary probability we define a value thus: Pr(A|B) = Pr(A&B)/Pr(B) Now what often seems to confuse even experts is the distinction between Pr(A|B) and Pr(B|A). Such confusion, esp. in criminal evidence, has come to be known as the "Prosecutor's fallacy": [Prosecutor's fallacy - Wikipedia] http://www.wikipedia.org/wiki/Prosecutor's_fallacy A correct and rigorous relationship between Pr(A|B) and Pr(B|A) can be stated, but it requires the a priori probabilities of both A and B. We can illustrate Bayesian analysis here by assuming an a priori distribution of probabilities for probability p. To keep things simple, let's assume that p is either 0 or 1, with equal chances before any experiment is done. If an experiment, with any number of trials, were to produce a "forbidden" number on a die, that would establish p = 0 under these conditions; the "cheat" must not be of any effect. But suppose for simplicity that one throw of the die occurs, and the result is not a forbidden number. Does this affect the probability that the "cheat" works, ie. that p=1 ? Note these easily verified calculations: Pr( number not forbidden | p=0 ) = 2/3 Pr( number not forbidden | p=1 ) = 1 Bayes formula is derived from the definition of conditional probability above by rewriting in this way: Pr(A|B) = Pr(A&B)/Pr(B) = Pr(A&B)/[Pr(A&B) + Pr(not(A)&B)] where by further application of conditional probabilities: Pr(A&B) = Pr(B|A)*Pr(A) Pr(not(A)&B) = Pr(B|not(A))*Pr(not(A)) Thus, Bayes formula: Pr(A|B) = Pr(B|A)*Pr(A)/[Pr(B|A)*Pr(A) + Pr(B|not(A))*Pr(not(A))] To use this in our circumstances, where: A means "p=1" B means "number not forbidden (in single throw of die)" we simply plug in the values previously determined, including the a priori probabilities for A and not(A) in this case: Pr(A) = Pr(not(A)) = 1/2 Pr(A|B) = 1*(1/2)/[1*(1/2) + (2/3)*(1/2)] = 3/5 To summarize, assuming that "a priori" the chances that the cheat either always works or never works are equal (50-50), then the way that a single roll of the die affects the "a posteriori" probabilities is that: If a forbidden number comes up, chance that cheat works is 0%. If a non-forbidden number comes up, chance that cheat works is 60%. Notice that a single roll of the die in which no forbidden number appears raises the chance that p=1 from 50% to 60%. This sort of calculation is easily extended to the case where N consecutive rolls of the die all fail to produce forbidden numbers. Intuitively as N increases, so should the probability that p=1, and the calculations bear this out: Pr( N numbers not forbidden | p=0 ) = (2/3)^N Pr( N numbers not forbidden | p=1 ) = 1 Bayes formula will then tell us: Pr( p=1 | N numbers not forbidden ) = 1*(1/2)/[1*(1/2) + (2/3)^N * (1/2)] = 1/[1 + (2/3)^N] which is easily evaluated on a calculator for particular values of N. In fact for N = 5: Pr( p=1 | 5 numbers not forbidden ) = 88 4/11 % and for N = 20: Pr( p=1 | 20 numbers not forbidden ) = 99.97% (approx.) Qualitatively, the chance that p=1 approaches 100% exponentially fast as N increases. It never reaches 100% exactly of course, for any finite value of N; there is always some "doubt" leftover as a result of the increasingly tiny term in the denominator that corresponds precisely to the chance of a "streak" under the possibility of p=0 (cheat never works). But this demonstrates, using a simple assumption about the a priori probabilities of the cheat working, that practical inference about p might not require a unduly large number of trials. regards, mathtalk-ga | |
| |
|
donphiltrodt-ga
rated this answer:
and gave an additional tip of:
$3.00
Excellent work. Thank you. |
|
Subject:
Re: Statistics: Hypothesis Testing needed; rigged dice.
From: racecar-ga on 16 Apr 2003 10:47 PDT |
Assuming a typical, six sided (fair) die, the probability that one of two specified faces will come up on a given roll is 1/3. So the probability that neither of those faces will show is 2/3. So if you roll the die N times, the probability that you will never see either of the specified faces is (2/3)^N. For example, if the die is fair, you must roll it 7 times for the probability of never seeing either of two faces to be less than 5% [ (2/3)^7 = .039 ]. You must roll it 12 times for the probability to be less than 1% [ (2/3)^12 = .0077 ]. All this is straightforward, but it only applies to a fair die. You might be tempted to say, "If I know that there is only a 1% chance that neither face will show in 12 rolls of a fair die, then if that happens, there's a 99% chance that the die is successfully rigged." But that would be wrong. It would be approximately correct if you know that half the time the rigging works perfectly, and the other half it completely fails, leaving a fair die, but this is not a fair assumption. A given rigging process is unlikely to work exactly half the time. Without knowing anything about the probabilities underlying the rigging process, it is impossible to give a precise numerical answer. It is only possible to give an exact answer to the question: 'what is the probality this would happen IF the die were fair?'. Nonetheless, at some point, even without knowledge of the rigging process, it is possible to be more or less certain the die is rigged. If you roll 50 times and never see the the two faces, you 'know' the die is rigged because that would only happen about once in a billion times with a fair die. |
Subject:
Re: Statistics: Hypothesis Testing needed; rigged dice.
From: mathtalk-ga on 16 Apr 2003 18:28 PDT |
Hi, donphiltrodt: Given the new "frame" of the problem, I would consider this to be a rather simple estimation problem. Think of it like this. Let p be the fraction of time that the "cheat" works. Score a 0 if a non-forbidden number comes up, and score 1 if a forbidden number comes up. Assuming the "cheat" is attempted on N tries, and that the total score on these attempts is M <= N, then the best "unbiased" estimator of p is given by a simple calculation: M ~ (1 - p)N/3 p ~ 1 - (3M/N) This corresponds to estimating the probability (1 - p) that the cheat fails as three times the number of attempts which result in getting a "forbidden" number divided by the total number of attempts. The question of how good an estimate this is for p is then handled by construction of a "confidence interval" around the estimated value. Even though we are dealing with repetitions of a binomial distribution (either a forbidden number appears or it doesn't), the distribution of the sample mean (average fraction of time a forbidden number appears) is close to a normal distribution (the so-called law of large numbers). Perhaps the most readily available software for doing these sorts of calculations is Excel. If you were taking an introductory course in statistics, there might very well be some "simple" software that only does these "confidence interval" calculations, but Excel is more the tool I would choose for the job. If you like I can post the formulas for you, with a sample calculation, as an answer. The conclusions are typically of the form "p is estimated to lie in an interval [a,b] with confidence level 95%". The wording is intended to make it sound like one is 95% sure that p is in the interval, but technically this is not what the calculations say. To understand the subject well one has to begin with conditional probability and graduate to the more complex subject of Bayesian inference. There's some free search terms for you, anyway. regards, mathtalk-ga |
Subject:
Re: Statistics: Hypothesis Testing needed; rigged dice.
From: donphiltrodt-ga on 16 Apr 2003 19:38 PDT |
>> If you like I can post the formulas for >> you, with a sample calculation, as an answer. That'd be great. Please do. TIA. |
Subject:
Re: Statistics: Hypothesis Testing needed; rigged dice.
From: hfshaw-ga on 18 Apr 2003 13:17 PDT |
Two words: Chi-squared. See: http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf http://virtuallygenetics.net/SeriesI/Mendel/section_05.htm http://www.anu.edu.au/nceph/surfstat/surfstat-home/4-2-4.html (scroll down to "example 60") http://fonsg3.let.uva.nl/Service/Statistics/X2Test.html http://www.uwsp.edu/psych/cw/statmanual/chisquare.html and many more. Most introductory statistics books will have sections discussing the use of the chi-squared distribution in hypothesis testing. Your original question asked how many times you would have to roll your rigged die to "ensure" that it was, in fact, not a fair die. If, by "ensure" you mean "the probability is equal to zero that the observed run of results could not be generated by throws of a fair die", then the answer to *that* question (as I think you must realize) is "an infinite number of times". For any finite number of throws, there is always a nonzero probability that the observed run could be due to a "streak" (as you put it) of a fair die. The probability might be very small, but it is nonzero. Thus, in the real world, you must pick some level of confidence at which you are willing to say, "that's good enough", I'm willing to live with the residual uncertainty. Racecar-ga gave the method for calculating the probability that a run of N throws of a fair die would result in a distribution in which two of the faces never showed up (P = 2/3^N). This, however, is *not* the same as asking what level of confidence one has in saying that the observed run was generated by a fair die. Formally, you want to test the hypothesis that a given set of observations (the results of N throws of your potentially rigged die) was drawn from the uniform distribution (with p_i=1/6 for each face) generated by a fair die. For this, you need to calculate chi-squared for your set of observations and compare it to the value of the chi-squared distribution with the appropriate number of degrees of freedom, (in this case equal to 5. See references given above for why this is so) and at your chosen level of confidence (your comfort level of residual uncertainty). The formula for chi squared is simply the sum over all observations of [(observed value - expected value)^2]/(expected value). In this case, the observed value is the number of times a given face comes up in your sequence of N throws, and the expected value is the number of times that face would be expected to come up if the die were fair (and is simply equal to N/6). The sum extends over the results for all six faces. The chi-squared statistic measures the "goodness of fit" between a set of observations and a comparison distribution. (If you are familiar with least-squares fitting, this is essentially the quantity that is being minimized in that procedure.) There are numerous tables of the chi-squared statistic as a function of significance level and degrees of freedom available. The links above include some references to such tables and to on-line calculators. If the value of chi-squared calculated for your set of data is larger than the tabulated value for then you can reject the hypothesis that the observed results were a result of throwing a fair die at the chosen level of confidence (i.e., you can be X% sure that the results were not from a fair die, where you get to pick the value of X). Note that this test works just as well if the rigging of the die is not perfect; if your method only reduces the probability that a face will come up to something less than 1/6, but does not reduce the probability to zero, you can still use this test. Obviously, the more subtle the change in "fairness" of the die, the more times you will have to test it in order to achieve the same level of confidence that it is, in fact, not fair. One caveat on the use of this test is that because of some approximations made in the derivation of the test, the expected number of observations in any single "bin" must be >5. That means that for the results of the test to be even approximately correct, you will need to throw a die at least 30 times (5 * 6 possible results). As an example, using the on-line calculator at http://fonsg3.let.uva.nl/Service/Statistics/X2Test.html, and assuming you threw a die 36 times, and four of the faces came up 9 times each, but the other two faces never came up, you could be 98.6% sure that the die is not fair. If, on the other hand, your "fix" is not perfect, and four of the faces come up 8 times each, while the other two come up twice each, you could only be ~65% sure that the die is not fair. As an aside, it would also be appropriate to use the chi-squared distribution to test the hypotheses associated with the examples in your original Google Answers question. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |