Google Answers: error margin and percent cofidence, statistics and probability

View Question

Q: error margin and percent cofidence, statistics and probability ( No Answer, 4 Comments )

Question

Subject: error margin and percent cofidence, statistics and probability
Category: Science > Math
Asked by: llamaflaca-ga
List Price: $40.00

Posted: 21 Aug 2005 21:04 PDT
Expires: 20 Sep 2005 21:04 PDT
Question ID: 558542

Greetings:

I need some help doing statistical analysis for an experiment. 
Basically I am looking to estimate the number of false events within
x% confidence level. I say x% because I may not have had enough
samples to reach 95% confidence level to begin with.

To consider the question fully answered I need the following:
1.	Explanation of why a particular model was chosen (i.e. binomial or
Poisson or something else).
2.	I also need a reference to either a link to a table or other way so
I may replicate the results provided by the researcher given enough
detail in item # 1 referenced above and subsequent items mentioned
below.
3.	Formulas used.
4.	Calculations for all groups providing the error margin and confidence.

Here is the problem:
I have changed some of the details but the concept is generally this:
I have 4 baskets, each basket has an unknown number of, lets call them
apples.  The apples may count in the 1000?s or even more per basket.
Each basket is separate from one another, that is to say they are
independent from each other basket.

Now, I sampled 500 apples from each basket.  Of those 500 apples, I
would like to estimate the number of non-apples (could we call them
false apples?) present in both the total 500 population and the larger
unknown population.  So I took a close look at 75 apples from each
bunch of 500.  From these closer look I obtained the following
results.

Group A: Out of 75 apples, 1 non-apple
Group B: Out of 75 apples, 9 non-apples
Group C: Out of 75 apples, 39 non-apples
Group D: Out of 75 apples, 2 non-apples

Here are the following issues I see with this. Since I sampled less
than 1000 in a population of X size, I don?t think I can use the
standard error and confidence formulas since I believe for 95%
confidence a minimum of 1000 must be sampled.  Additionally, I fail
the second rule which is at least 10 false and true events and Groups
A, B, and D do not satisfy this.

I considered the Binomial model but again, too few hits.  I considered
rare events and the Poisson model as another option but here is where
I get lost.

I will say the following conditions apply to my samplings:
1. When sampling 75 out of 500, they are Bernoulli in nature, sampling
1 or 75 does not change the odds of each one and the prior results do
not affect the future results of the next sampling.
2. I do not know what are the odds or true probability of getting a
non-apple out of 75 or 500 or millions.  I only see from my limited
sampling that for Group A it was 1 / 75  and so on.
3. I only sampled one instance of 75 out of 500. 

The ultimate answer I am looking for here is:
Given Groups A to D, if for example I have 9 out of 75 non-apples as
in Group B, what is the probability of a non-apple in this and the
other groups? With what percentage of confidence? What is my margin of
error meaning could it be between say 2 and 10 non-apples with 95%
confidence? And how can I extend this to the population of 500 and or
a larger unknown total population? For example what is the probability
of getting a non-apple in this type or basket for say a population of
millions of apples (again within this basket ? think of types)

Answer

There is no answer at this time.

Comments

Subject: Re: error margin and percent cofidence, statistics and probability
From: raokramer-ga on 18 Sep 2005 22:02 PDT

My understanding is that we have one basket with m+n=500 apples out of
which n are special, and we are drawing N=75 apples at random and we
get x=i special apples in the sample. As explained at
http://mathworld.wolfram.com/HypergeometricDistribution.html where you
can find the formulas for your report, x has a hypergeometric
distribution with n as the parameter of interest and x as an observed
random value. I?ll follow notations of the link above. For the most of
the part, however, we'll need not the formulas but reasoning with
understanding of what confidence intervals are.

The files with tables and a SAS program for your reference are posted
at http://www.geocities.com/raokramer/index.html


PLAN:

Assuming that you prefer a conventional statistical method, let's
compute a maximum likelihood (ML) estimate nHat of n (number of
special apples in the basket) for each i=1, ... ,75. Then to obtain
confidence intervals (CI), we will apply a standard procedure of
inverting Likelihood Ratio Test acceptance regions (LRT, see for
example "Statistical Inference" by G.Casella, R.Berger, ch. 'Interval
Estimation').


SOLUTION:

1. The easy part is to compute an ML estimate nHat for the number n of
special apples in the basket. Since the low counts won't allow us to
apply normal approximation to x, we should compute an exact estimate
by maximizing likelihood function over the set of all possible values
of the estimated parameter n. File "d.txt" contains probabilties
(=likelihoods) of oberving each value of x given different n's. For a
fixed x, nHat will be the n with the largest likelihood. The results
are in file "nHats.txt". E.g. for observed value of x i=1 ML estimate
for n is nHat=6. Field estP is the maximum of likelihood function, it
will also be used in the second part as a denominator for the LRT
statistic.

2. To obtain CI we consider Likelihood Ratio Test. For a fixed value
of n in the Null hypothesis ?H0: the number of the special apples in
the basket is n?, the upper and lower bounds of LRT acceptance region
at 95% confidence limit (UCL(n) and LCL(n)) can be computed as the
bounds of the set of nHat values corresponding to the highest LRT
statistic for which probabilities sum up to 95% (over nHats for fixed
n).
File "LRTstats.txt" contains LRT statistics (LRTstat= p / estP) in
descending order for each group of n. To obtain the 95% test
acceptance bounds, those nHat's will be chosen that correspond to the
first several largest LRT stats so that cumulative probability (p)
equals or barely exceeds 0.95.
File "regionBounds95.txt" contains those test acceptance bounds for each tested n,
with CRp as the confidence coefficient for the test confidence region.

Now to obtain CI's we are inverting the test regions. Each test region
in effect maps (one-to-many) an n to a set of nHat's. All we need is
to invert this mapping in the functional sense, so we obtain a map
that maps each nHat to a subset of n's. We should also attribute a
probability of coverage for each of those mappings that will be the
minimum confidence coefficient over all n's that correspond to a
specific nHat.
The file "map.txt" contains the mentioned mapping. Note that not all
nHat's between the limits of the test region are possible by virtue of
nHat being a function (incidentally a 1-to-1) of observed x=i as
defined in "nHats.txt". Inverting that map to obtain confidence limits
for n based on nHat is a simple matter of one SQL statement with the
result in "invertedMap.txt". Another SQL statement maps nHats to the
observed i's, which brings us to the final solution in file
"solution.txt", where

i - observed number of special apples in the sample
nHat - ML estimate for number n of special apples in the basket
nLCL - lower confidence limit for n
nUCL - upper confidence limit for n
 CLp - confidence coefficient (probability of coverage of n by the
confidence interval)


ANSWER:

The requested estimates are:

i nHat nLCL nUCL  CLp
-----------------------
001 006 001 030 0.95181
002 013 003 040 0.95181
009 060 030 101 0.95007
039 260 207 314 0.95396

Thanks
-- RK

Subject: Re: error margin and percent cofidence, statistics and probability
From: llamaflaca-ga on 19 Sep 2005 15:40 PDT

Thank you for this. Since you posted this as a comment, how may I
compensate you for your troubles?.

Will be checking the numbers and backup data over the next couple of days.

Subject: Re: error margin and percent cofidence, statistics and probability
From: raokramer-ga on 19 Sep 2005 20:05 PDT

No trouble at all, looks like you did a nice job formulating the
problem, don't hesitate to ask again. Also there are forums that many
qualified statisticians attend, e.g.
http://www.listserv.uga.edu/cgi-bin/wa?A0=sas-l. Two things to
mention:
1. When you subscribe, somebody will be reviewing your request and grant you access
2. If your question sounds like a homework or a midterm project, I'd
bet it will be ignored at a minimum

I hope the apples are behaving well :)

Subject: Re: error margin and percent cofidence, statistics and probability
From: raokramer-ga on 20 Sep 2005 05:29 PDT

I just had to re-upload the LRTstats.txt because I noticed I uploaded
the wrong file that resulted from a typo, everything else is correct.
-- RK

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy