View Question
Q: Probability and degree of reliability (accuracy in predicting future results) ( No Answer,   6 Comments )
 Question
 Subject: Probability and degree of reliability (accuracy in predicting future results) Category: Science > Math Asked by: respree-ga List Price: \$25.00 Posted: 23 Jun 2006 10:39 PDT Expires: 23 Jul 2006 10:39 PDT Question ID: 740543
 ```If you threw 1000 pennies up into the air, about half of them would land heads, with the other hand landing tails ? an approximate 50% probability of one or the other. We know this, because a coin has two sides. However, if you only had two pennies and used only a single toss, the likelihood that one would land hands with the other one landing tails would be somewhat less than in the first example. Repeat the single toss a thousand times, and you?d likely wind up with similar results as if you used 1,000 pennies. My question has to do with the reliability of probability outcomes based on a small sample size. Let?s take a hypothetical situation. Let?s say a clinical study was performed involving 40 patients to see if an experimental drug worked or not. There are only two possible outcomes ? it either worked or it didn?t. On 4 of the patients, the drug did not work and on the remaining 36, it did work. One might deduce from this data that the drug works approximately 10% of the time. But would this be a fair conclusion? Would the reliability of a future predictive assumption be compromised, because the sampling size is too small? Obviously, if one were looking at a failure rate of 10% based on a test on 400,000 patients (40,000 patients had no results), it would lead any reasonable person to conclude, with fair degree of probability, that 10% is in fact, the actual failure rate. It seems to me that there should be a mathematical correlation between sampling size and the predictive reliability, reaching a point where the predictive probability is no longer reliable. What is that point? I?d appreciate any comments or thoughts from someone who has a strong math background or a researcher that can find websites explaining the correlation of sampling size to reliability. The bottom line question. Would a reasonable person conclude in my hypothetical 40 patient study that approximately 10% is a ?fair? basis for predicting future failure rates?``` Request for Question Clarification by neurogeek-ga on 28 Jun 2006 11:44 PDT ```respree, Are you still interested in a full answer? I think I could come up with more than is already contained in the comments, with some good links. --neurogeek``` Clarification of Question by respree-ga on 30 Jun 2006 22:24 PDT ```Hi neurogeek: Thanks for your comment. Yes, I'm still interested. If you could keep the answer (and links) to something the 'average' person can understand, that would be great. Thanks for any assistance you can provide.``` Request for Question Clarification by neurogeek-ga on 05 Jul 2006 02:14 PDT ```Unfortunately, I didn't have time to write this before leaving on vacation. I will be back on the 21st. Perhaps someone else will be able to take on this question. --neurogeek``` Request for Question Clarification by pafalafa-ga on 07 Jul 2006 17:59 PDT ```respree-ga, Since neurogeek seems to bowing out here, I thought I'd add my two cents. A concept very closely related to your question is the "margin of error", a phrase commonly-heard used in public opinion polls. A pollster might say that 40% of the population will vote for so-and-so. But the accuracy of that statement depends in large measure on the sample size, just as the reliability of the penny-toss results, or the clinical study,, are dependent on sample size. The nature of that dependence is visualized very nicely in this Wikipedia article on "margin of error": http://en.wikipedia.org/wiki/Margin_of_error and you can see the graph close up here: http://en.wikipedia.org/wiki/Image:MarginoferrorViz.png As the graph makes clear, for sample sizes in the thousands, the margin of error shrinks to single digits. When the sample size is in the hundreds, a margin of error it 10% or so. Though not quantified on the graph, a sample size of only 40 would produce a substantially larger margin of error, so that the results, though not necessarily invalid, would need to be taken with several large grains of salt. Is that the sort of information you're looking for? pafalafa-ga```
 Answer
 There is no answer at this time.

 Comments
 Subject: Re: Probability and degree of reliability (accuracy in predicting future results From: myoarin-ga on 24 Jun 2006 02:58 PDT
 ```Just a free comment: I once read a delightful and very interesting book about common misunderstanding of statistics. It touched on this very subject in one or two chapters: the true statistical meaning medical testing and how the raw numbers are sometimes misinterpreted by medical researchers.```
 Subject: Re: Probability and degree of reliability (accuracy in predicting future results From: rracecarr-ga on 26 Jun 2006 13:14 PDT
 ```The standard deviation (amount of spread) of the number of failures you'll get is roughly equal to the square root of the average number of failures. So, given that you got 4 failures, there's a reasonably good chance (better than half) that the mean number of failures you'd get in a bunch of tests with sample size 40 is 4 +/- sqrt(4), or between 2 and 6. So a good estimate based on this single test is that the failure rate is likely to be between 5 and 15%. Similarly, in the other example, with 40,000 failures, the average number of failures is likely to be between 39,800 and 40,200 (40,000 +/- sqrt(40,000)). So in that case, the failure rate is likely to be between 9.95% and 10.05%.```
 Subject: Re: Probability and degree of reliability (accuracy in predicting future results) From: respree-ga on 27 Jun 2006 07:39 PDT
 ```Thanks you both for your comments. Can anybody else confirm rracecarr-ga's comment on standard deviation? Sorry if this seems so basic for the mathemeticians out there, but I'm afraid I'm no math size and am just looking for people to agree that this is the correct way of approaching the answer to my question. Thanks again. =)```
 Subject: Re: Probability and degree of reliability (accuracy in predicting future results From: neurogeek-ga on 28 Jun 2006 11:42 PDT
 ```respree, I also thought immediately of standard devation when I read your question. I think there is more to it than that, though. Often when average and standard deviation are reported, the probability that the actual average is outside the predicted range is also reported. Are you still interested in a full answer? I think I could come up with more than is already contained in the comments, with some good links. --neurogeek```
 Subject: Re: Probability and degree of reliability From: ga_cal-ga on 07 Jul 2006 01:21 PDT
 ```It's widely known that a proportion estimate --say q--(i.e. the number of occurences of a specific event from a large sample of N individuals), follow a normal law centered on the theoretical proportion --say p--, with a variance of: V(p,N) = p*(1-p)/N usually, we take q as an estimate of p, so you now know that the empirical estimate of your proportion q is centered on p with variance V(q,N) = q*(1-q)/N so a confidence interval on q with a confidence level of 95% is: [q-1.96*sqrt(q*(1-q)/N); q+1.96*sqrt(q*(1-q)/N)] (1.96 is related to 95% through a gaussian distribution, use for instance http://graphpad.com/quickcalcs/probability1.cfm in last section --GAUSSIAN-- use mean=0 and STD=1, on the next page you will read on the last column: 5.49%->1.92 and 4.77%->1.98) For you example with 40 patients, the confidence interval is: [.10-1.96*sqrt(.10*(1-.10)/40), .10+1.96*sqrt(.10*(1-.10)/40)] i.e. [0.7%, 19.3%] i.e. "probability that real proportion is in [0.7%, 19.3%] is 95%" And with 40.000 patients: [9.71%, 10.29%] see for instance http://davidmlane.com/hyperstat/B9168.html or any statistical book http://books.google.com/books?q=proportion+estimate+confidence&lr=&sa=N&start=20```
 Subject: Re: Probability and degree of reliability (accuracy in predicting future results From: rracecarr-ga on 07 Jul 2006 17:37 PDT
 ```The previous comment is not right. The binomial distribution will only be approximately normal for very large N. For example you certainly cannot have a negative number of failures. The stated 95% confidence interval of [0.7% 19.3%] is silly. If the failure rate were really 0.7%, the probability of getting 4 or more failures in 40 trials is only 0.018%. That's less than one chance in 5000.```
 Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service. If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for

 Google Home - Answers FAQ - Terms of Service - Privacy Policy