Google Answers Logo
View Question
 
Q: Formula for determing sample size (statistics) ( No Answer,   8 Comments )
Question  
Subject: Formula for determing sample size (statistics)
Category: Science > Math
Asked by: azureae-ga
List Price: $15.00
Posted: 14 Apr 2005 03:11 PDT
Expires: 14 May 2005 03:11 PDT
Question ID: 509084
Let's say you have a jar with X balls in it, say 10,000. The balls are
either red, green, or blue. You reach into the jar, select a ball,
record the color, and return the ball to the jar. How many times do
you need to repeat this (samples) in order to get certain accuracy Y?

I'm looking for the following information:
What is the statistical way to measure accuracy as related to this
problem? I think its something related to the confidence interval, but
I can't remember what that means exactly.

What is standard error as it applies to this problem? (again I
remember the term but not what it means)

Are there any other statistical accuracy terms that apply to this problem?

And the heart of the question is, please make a formula where the
inputs are the number of balls in the population (which will always be
in one of three states), the statiscal terms for the accuracy desired
(as you will explain from above) and outputs the number of samples
needed.

Thanks! Feel free to email any questions to indigoae@gmail.com

Request for Question Clarification by elmarto-ga on 14 Apr 2005 04:11 PDT
Hello azureae,
From your question, it's not clear what you mean by "accuracy". Are
you trying to measure, using a sample, how many balls are red, green
or blue in the population?

Regards,
elmarto

Clarification of Question by azureae-ga on 14 Apr 2005 14:45 PDT
Yes, I am trying to measure to estimate via a sample how many red,
green, and blue ball there are in the population. There will be all
three, although one of the three, always the blue, will be in a lower
proportion. The red and green will vary in number also, I'm just
saying that blue will always be much less (perhaps 1/6th of the others
for example) then red and green (but still detectable via sampling).

Regarding romanak's comment, the formula provided doesn't have the
population size as an input, an the population size will vary greatly.
I can't imagine the population size doesn't matter, because what if
we're sampling say a jar of a trillion balls, is 2796 still
sufficient?

Also, regarding using the estimate of proportion, that is what I'm
trying to figure out, so if I knew that then I wouldn't need to sample
at all!

Request for Question Clarification by elmarto-ga on 14 Apr 2005 16:19 PDT
Hi azurae!
OK, I see now what you're interested in. However, the logic of this
problem can change substantially for different population sizes.
Specifically, it's very different if we have a "small" or "large"
population. Let's say that initially 50% of the balls are red. If the
population is is sufficiently large, when we take a first red ball,
the probability of taking another red ball in a subsequent draw is
still 50%.

For example, if there are 100,000 balls (so 50,000 red ones), and the
first I draw is red, then the probability of drawing another red ball
is 49,999/99,999, which is almost the same as 50%. If this is the
case, then the required sample size can be determined quite easily.

However, in an extreme example where the population is 4 balls (so 2
are red), the proportion changes after you take the first red ball. If
you take a red ball on the first draw, then the probability of drawing
another one becomes 1/3, which is quite different to 0.5. This
complicates the problem.

So, do you have a "large" enough population?

Regards,
elmarto

Request for Question Clarification by elmarto-ga on 14 Apr 2005 16:24 PDT
I forgot to mention this in my question. You ask "if we're sampling
say a jar of a trillion balls, is 2796 still sufficient?". The answer
would be yes (if the 2796 figure is correct in the first place). Once
you have a "large" population, you will need the same quantity of
balls in order to determine the proportions, no matter if the
population size is 1 million or 1 trillion.

Clarification of Question by azureae-ga on 14 Apr 2005 16:37 PDT
Yes, the populations will be very large. There will be several
populations, the largest of which will be about 10 trillion, the
smallest of which will be about 200k.

Also please note that I mentioned that once the ball is observed, it
is returned to the jar. So in the case of the 4 ball problem (while
not relevant) please notice that the same ball can be drawn multiple
times.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Formula for determing sample size (statistics)
From: romanok7r-ga on 14 Apr 2005 11:42 PDT
 
I think what you need is to determine a Sample Size for an Estimate of
population proportion. You have two measures of accuracy there -
confidence interval, and margin of error. Formula is
n= (z[alpha/2])^2*p(1-p)/e^2
zalpha/2 is confidence interval, would be 1.96 for 95 confidence
interval (you can look it up on standart normal table for different
values), p would be  your estimate of proprotion of, for example, red
balls, (let's say, .30), and e would be margin of error you want in
estimating proprtion, for example .02.
So, example n= 1.96^2*.3(.7) / .02^2 = 2016 balls you need to sample.
Hope the comment helps ;) Hey, it's free.
Subject: Re: Formula for determing sample size (statistics)
From: felldownstairs-ga on 15 Apr 2005 11:55 PDT
 
Rather than write out a bunch of equations that fail to format very
well on these pages, I will direct you to this link instead:

http://www.math.bcit.ca/faculty/david_sabo/apples/math2441/section8/lrgsampprops/largesampprops.htm

It explains the basics of calculating not only a confidence interval
for population proportion estimates, but also does an adequate job of
explaining the two main methods of determining the appropriate sample
size for computing such estimates given any confidence interval
estimate.

However, I will say that no matter how you go about it there are a
number of problems:

First, the fact that in most commonly used approach to determining
sample size, the population proportion is actually a variable, meaning
that you already must have some idea of the population proportions
prior to estimating appropriate sample size. This leads to the
circular reasoning that you see in a lot of Social Science papers
where a study is run, an estimate is calculated regarding population
proportion at a certain confidence level. That estimate is then used
as a proxy for the population proportion in determining whether or not
the sample size was reasonably large enough to justify claims of
statistical accuracy. Doesn't really work, but it's used quite often.

Secondly, in measuring proportions, the difficulty is always in the
fact that as population proportions approach either 50% or 0% the
sample sizes required for 1% accuracy at a 95 percent confidence level
can be impossibly enormous, such that they are actually greater than
the population you are attempting to estimate for. That doesn't mean
that your estimates are wrong, it is simply a limitation of the
equation itself.

Anyway, good luck with it.

Felldownstairs-ga
Subject: Re: Formula for determing sample size (statistics)
From: azureae-ga on 15 Apr 2005 17:56 PDT
 
I noticed that the previous commentor said that: "First, the fact that
in most commonly used approach to determining sample size, the
population proportion is actually a variable". Does this mean there
are less commonly used approaches where this is not the case?
Subject: Re: Formula for determing sample size (statistics)
From: felldownstairs-ga on 15 Apr 2005 20:09 PDT
 
There are two general equations used in determining the appropriate
sample size for estimations of population proportions.

The first and most commonly used (according to frequency in research
papers) is that which includes an actual measure of population
proportion as a variable in the equation itself.

The second method corrects for this but has the unfortunate effect of
overestimating the appropriate sample size. Actually, more correctly,
the larger the sample size the better any estimate, so there really is
not a problem, statistically speaking, to using a larger sample size
than necessary. However, there are real world considerations that come
into effect in using this method, especially in bio-medical research
where the cost of the project often increases with the size of the
sample group.

In your case, since the question is purely a theoretical
consideration, using the second method shouldn't pose a problem.
Subject: Re: Formula for determing sample size (statistics)
From: azureae-ga on 15 Apr 2005 21:03 PDT
 
Does anyone know where I can get information on the second method?

Thanks in advance!
Subject: Re: Formula for determing sample size (statistics)
From: felldownstairs-ga on 16 Apr 2005 01:13 PDT
 
Did you go to the link that I posted in my first comment? It explains
both approaches quite well, and only leaves it up to you to decide
what level of accuracy and degree of confidence.
Subject: Re: Formula for determing sample size (statistics)
From: volterwd-ga on 19 Apr 2005 16:45 PDT
 
you said

"Let's say you have a jar with X balls in it, say 10,000. The balls are
either red, green, or blue. You reach into the jar, select a ball,
record the color, and return the ball to the jar. How many times do
you need to repeat this (samples) in order to get certain accuracy Y?"

So in other words the number of balls is immaterial... only the
proportion of them which are red/green/blue.

Since you select one ball at a time... and this is with replacement
you effectively have an infinite population size... treat this as you
would a sample of size n from an infininte population.

The standard error is defined as the standard deviation of the
sampling distribution of the statistic.  In this case you want the
standard deviation of the estimators of the proportion of
red/blue/green
Subject: Re: Formula for determing sample size (statistics)
From: qzx-ga on 12 May 2005 21:58 PDT
 
Here's link to a page with javascript for calculating sample size for
a given population, margin of error, and confidence level:

http://www.surveyguy.com/SGcalc.htm

Hope this helps.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy