Google Answers Logo
View Question
 
Q: Equation to calc level of confidence of possibility of a successful scan ( No Answer,   2 Comments )
Question  
Subject: Equation to calc level of confidence of possibility of a successful scan
Category: Science > Math
Asked by: rshen-ga
List Price: $25.00
Posted: 08 Sep 2004 11:42 PDT
Expires: 08 Oct 2004 11:42 PDT
Question ID: 398442
We started scanning large number of documents.  We need to find a
general equation that would tell us whether a page has been scanned
with a certain level of confidence.  For example:
1) we just finished scanning a batch of say approx. 1000 original
documents (could be approx 500, 2000, 2300 etc), i.e. the scanner
tells us 1000 documents were scanned but we do not know if the
operator missed any originals
2) we want to randomly choose a number of originals and check if those
were scanned, the results of this check are either "scanned" or "not
scanned"
3) ultimately, we want to know what is the general formula for any
level of confidence that a particular original was scanned (or not
scanned) given various combinations of batch size and sample size. 
E.g. For a batch of around 1000 documents, in order to get a 95% level
of confidence that an original doc has been scanned, one needs to
sample X number of originals and get Y "scanned" results.
I'm not a statistician, so please ask for clarifications if I'm not
clear on any aspects of this problem

Request for Question Clarification by mathtalk-ga on 08 Sep 2004 23:32 PDT
Hi, rshen-ga:

In statistics there are two rather technical terms:

level of significance
confidence interval

which are at least loosely related to your Question.

I'll be happy to explain their meanings (for free), but I'd like to
have a bit of clarification about what you are looking for.  By "level
of confidence that a particular original was scanned (or not
scanned)", do you mean an entire batch or a single (random) document
within the batch?

regards, mathtalk-ga

Clarification of Question by rshen-ga on 09 Sep 2004 07:12 PDT
We have thousands of boxes of old documents that need to be scanned. 
Each box contains between 1000 to 5000 pages.  We are hiring some
temps to feed these documents into scanners (after removing staples,
sticky notes, etc.).  We pay the temps by the number of boxes they
scanned.

Now, we would like to find out whether all the contents for each box
have been scanned.  Of course, we can match each document with its
computer image but that's not feasible.  So we thought maybe we will
just randomly sample X number of documents from each box containing Y
documents and see if their images exist in the computer.

Our question is: if we want to be 95% certain that all documents were
scanned from a box of Y number of originals, what is X?
Or, if we randomly choose 20 documents from a box of 2000 originals
and find that only 19 of those documents were scanned, then what level
of confidence do we have when we say that all originals were scanned.

We would like to get the formula that tells us given z good samples
out of X samples within Y total originals, what is our level of
confidence that all originals were properly scanned.  Or vice versa,
in order to be so many percent confident that all Y originals were
properly scanned, we need to have at least z good samples out of x
samples.

Ultimately, this is a business problem, so if there is another
approach, instead of statistical, to solving out problem, we will be
just as happy.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Equation to calc level of confidence of possibility of a successful scan
From: lxndr-ga on 06 Oct 2004 12:25 PDT
 
You are looking for a binominal approximation to the mean of a sample

The official formula for the confidence interval is the following:

(X-2*(s/[sqrt]n ; X+2*(s/[sqrt]n)

You'll have to select a sample size (n) {the amount of checks you make} 
Now of this sample count the amount right scanned doc's 
In an excel file make a column saying 1 - n (1,2,3...n) and the next
column, 1's or 0's (1 is correct 0 is false) according to findings.

Now do a statistical analysis using the Data Toolkit in menu Extra
{select 'descriptive statistics' in the list}
You'll get a table with the X {mean} the s {standard deviation}
Fill in the formula {[sqrt] means square root of course} and you'll
get an interval, for example, (950,980) this means that concerning the
sample, the mean of correct scans of the entier batch lies between 950
and 980 with a confidence of 95%. The lower this number the worse the
temps you hired are!

Good luck!
Subject: Re: Equation to calc level of confidence of possibility of a successful scan
From: rshen-ga on 06 Oct 2004 18:47 PDT
 
Thanks a bunch for your comment.  I do not pretend to understand the
rationale behind the formula, but I can definitely follow the steps
you outlined.  Can't wait to go back to the office and run it.  I will
let you know the outcome.  Thanks again.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy