Google Answers Logo
View Question
 
Q: Increase the probability of gaining a correct answer by taking more data samples ( Answered 5 out of 5 stars,   8 Comments )
Question  
Subject: Increase the probability of gaining a correct answer by taking more data samples
Category: Science > Math
Asked by: myxlplix-ga
List Price: $20.00
Posted: 15 Aug 2002 16:47 PDT
Expires: 14 Sep 2002 16:47 PDT
Question ID: 55038
Here is the problem:
I have one hundred recon planes that have a 80% chance each of giving
me an accurate report. I have designated a target and send one recon
plane to give me a report. The report returned is a positive or
negative answer(Yes\No). Examples would be: Yes the bridge is
destroyed or No the bridge is not destroyed. This is what I want in
the answer:

1: Will I increase my chances of being accurate if I send more than
one plane out?
2: What is the reasoning behind the answer.
3: If the answer to question 1 is Yes how would I apply the answer to
question 2 so that I can figure my accuracy or probability of having a
correct answer to the question the recon planes went to answer. 
Example: After ten reports with 8 planes reporting the bridge
destroyed and 2 reporting the bridge still intact, I have a 96% chance
of being correct if I conclude the bridge is destroyed.

Please ask for clarification before answering if you have any doubts
about what I am looking for. I want to be able to apply the
information that you give so you should expect me to ask clarification
questions if I have any doubts on how to apply the
formula.
Answer  
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
Answered By: websearcher-ga on 15 Aug 2002 20:21 PDT
Rated:5 out of 5 stars
 
Hi myxlplix-ga :

This is an interesting probability question. I will assume that you
are a novice in this area, so please forgive me if I explain something
that's obvious to you. :-)

Let's start with what you know. 

* You have 100 planes.
* Each plane has an 80% chance (or a .8 probability) of returning a
correct observation and a .2 probability of returning an incorrect
observation.

Figuring out the probability that n planes return a correct conclusion
overall is simply a matter of adding up the probabilities for each
possible combination of individual observations that leads to a
correct conclusion. (We'll assume here that if there is a tie - say 2
planes return correct observations and 2 planes return incorrect
observations out of 4 planes sent - then we count that ambiguous
result as an *incorrect* conclusion.)

We'll call our probability of a correct conclusion p_correct_n.

What happens for 1 plane? There's only one possible scenario for a
correct conclusion in this case - the one plane returns a correct
observation.

p_correct_1 = 1*(.8) = .8

What about for n=2? The only correct scenario is when both planes
return a correct conclusion. This can only happen in one way - both
plane_1 and plane_2 return correct observations.

p_correct_2 = 1*(.8 * .8) = .64

What about n=3? There are two correct scenarios here - either all
three planes return correct conclusions or two of three planes return
correct observations and the other plane returns an incorrect
observation. However, the second scenario (2 out of 3 correct) can
happen in 3 separate ways - plane_1 and plane_2 correct, plane_1 and
plane_2 correct, or plane_2 and plane_3 correct.

p_correct_3 = 1*(.8 * .8 * .8) + 3*(.8 * .8 * .2) = .896

What about n=4? There are two correct scenarios here as well - either
all four planes return correct observations or three out of four
planes return correct observations and the other returns an incorrect
observation. The second scenario can happen in 4 separate ways -
planes_1,2,3, planes_1,2,4, planes_1,3,4, or planes_2,3,4.

p_correct_4 = 1*(.8 * .8 * .8 * .8) + 4*(.8 * .8 * .8 * .2) = .8192

So, the probabilities seem to be going up and down, but in an overall
upward trend.

What are the general formulas for p_correct_n?

If n is odd then:

p_correct_n = sum( nC(n-i+1) * .8^(n-i+1) * .2^(i-1), i=1..(n+1)/2 ) 

If p is even then:

p_correct_n = sum( nC(n-i+1) * .8^(n-i+1) * .2^(i-1), i=1..n/2 )

where: 

* sum(...) is the Sigma (summation) function where all the appropriate
values over the range of i are added together

* nCi is "n choose i", or the number of separate ways that i items can
be chosen from n items. This can be computed with the formula:

nCi = n!/(i!*(n-i)!

The ! (factorial) operator is calculated by multiplying all the
positive intergers equal to or less than the given value together. For
example, 6! = 6*5*4*3*2*1 = 720. (Also, 0! = 1)

* The "^" character represents exponentiation. For example, 2^3 =
2*2*2 = 8.

So, re-evaluating for n=1,2,3,4 gives:

p_correct_1 = sum(1C(2-i) * .8^(2-i) * .2^(i-1), i=1..1)
            = 1!/(1!*0!) * .8^1 * .2^0
            = 1 * .8 * 1
            = .8

p_correct_2 = sum(2C(3-i) * .8^(3-i) * .2^(i-1), i=1..1)
            = 2!/(2!*0!) * .8^2 * .2^0 
            = 1 * .64 * 1
            = .64

p_correct_3 = sum(3C(4-i) * .8^(4-i) * .2^(i-1), i=1..2)
            = 3!/(3!*0!) * .8^3 * .2^0  +  3!/(2!*1!) * .8^2 * .2^1
            = 1 * .512 * 1  +  3 * .64 * .2
            = .896

p_correct_4 = sum(4C(5-i) * .8^(5-i) * .2^(i-1), i=1..2)
            = 4!/(4!*0!) * .8^4 * .2^0  +  4!/(3!*1!) * .8^3 * .2^1
            = 1 * .4096 * 1  +  4 * .512 * .2
            = .8192

These values match our earlier results. 

Using a computer math program called Maple (http://www.maplesoft.com),
I computed the values for all values of n from 1 to 100.

                   "p_correct_1 =", .8
                   "p_correct_2 =", .64
                   "p_correct_3 =", .896
                   "p_correct_4 =", .8192
                   "p_correct_5 =", .94208
                   "p_correct_6 =", .901120
                   "p_correct_7 =", .9666560
                   "p_correct_8 =", .94371840
                   "p_correct_9 =", .980418560
                  "p_correct_10 =", .9672065024
                  "p_correct_11 =", .98834579456
                  "p_correct_12 =", .980594720768
                  "p_correct_13 =", .9929964388352
                  "p_correct_14 =", .98839008641024
                  "p_correct_15 =", .995760250290176
                  "p_correct_16 =", .992996438835200
                  "p_correct_17 =", .997418537163163
                  "p_correct_18 =", .995747966683708
                  "p_correct_19 =", .998420879450834
                  "p_correct_20 =", .997405172599325
                  "p_correct_21 =", .999030303561736
                  "p_correct_22 =", .998409799012455
                  "p_correct_23 =", .999402606291308
                  "p_correct_24 =", .999022030167748
                  "p_correct_25 =", .999630951965445
                  "p_correct_26 =", .999396751274023
                  "p_correct_27 =", .999771472380298
                  "p_correct_28 =", .999626937096448
                  "p_correct_29 =", .999858193550607
                  "p_correct_30 =", .999768774388334
                  "p_correct_31 =", .999911845047974
                  "p_correct_32 =", .999856405167363
                  "p_correct_33 =", .999945108976340
                  "p_correct_34 =", .999910671026972
                  "p_correct_35 =", .999965771745958
                  "p_correct_36 =", .999944343688577
                  "p_correct_37 =", .999978628580389
                  "p_correct_38 =", .999965275517262
                  "p_correct_39 =", .999986640418264
                  "p_correct_40 =", .999978308106872
                  "p_correct_41 =", .999991639805099
                  "p_correct_42 =", .999986434094364
                  "p_correct_43 =", .999994763231542
                  "p_correct_44 =", .999991507296101
                  "p_correct_45 =", .999996716792807
                  "p_correct_46 =", .999994678294096
                  "p_correct_47 =", .999997939892035
                  "p_correct_48 =", .999996662432841
                  "p_correct_49 =", .999998706367551
                  "p_correct_50 =", .999997905145145
                  "p_correct_51 =", .999999187100994
                  "p_correct_52 =", .999998684179852
                  "p_correct_53 =", .999999488853677
                  "p_correct_54 =", .999999172944694
                  "p_correct_55 =", .999999678399069
                  "p_correct_56 =", .999999479827707
                  "p_correct_57 =", .999999797541884
                  "p_correct_58 =", .999999672647345
                  "p_correct_59 =", .999999872478607
                  "p_correct_60 =", .999999793878312
                  "p_correct_61 =", .999999919638788
                  "p_correct_62 =", .999999870145956
                  "p_correct_63 =", .999999949334483
                  "p_correct_64 =", .999999918154003
                  "p_correct_65 =", .999999968042776
                  "p_correct_66 =", .999999948389620
                  "p_correct_67 =", .999999979834666
                  "p_correct_68 =", .999999967441619
                  "p_correct_69 =", .999999987270496
                  "p_correct_70 =", .999999979452252
                  "p_correct_71 =", .999999991961439
                  "p_correct_72 =", .999999987027258
                  "p_correct_73 =", .999999994921948
                  "p_correct_74 =", .999999991806745
                  "p_correct_75 =", .999999996791066
                  "p_correct_76 =", .999999994823574
                  "p_correct_77 =", .999999997971567
                  "p_correct_78 =", .999999996728511
                  "p_correct_79 =", .999999998717401
                  "p_correct_80 =", .999999997931787
                  "p_correct_81 =", .999999999188764
                  "p_correct_82 =", .999999998692106
                  "p_correct_83 =", .999999999486760
                  "p_correct_84 =", .999999999172684
                  "p_correct_85 =", .999999999675206
                  "p_correct_86 =", .999999999476536
                  "p_correct_87 =", .999999999794408
                  "p_correct_88 =", .999999999668702
                  "p_correct_89 =", .999999999869835
                  "p_correct_90 =", .999999999790276
                  "p_correct_91 =", .999999999917569
                  "p_correct_92 =", .999999999867204
                  "p_correct_93 =", .999999999947783
                  "p_correct_94 =", .999999999915896
                  "p_correct_95 =", .999999999966923
                  "p_correct_96 =", .999999999946724
                  "p_correct_97 =", .999999999979037
                  "p_correct_98 =", .999999999966244
                  "p_correct_99 =", .999999999986715
                 "p_correct_100 =", .999999999978608

Now to answer your original questions:

1: Will I increase my chances of being accurate if I send more than 
one plane out? 

Yes, as you can see, the trend of the results definitely gets more
accurate the more planes you send out. However, because of the assumed
rule for ties, p_correct_n is a little less accurate for each even
value of n than it is for the previous odd value of n.

2: What is the reasoning behind the answer. 

I've explained the mathematical reasoning above. :-) 

In plainer terms, because the planes individually are so accurate
(80%), the more you send out, the less likely you are to get enough
inaccurate observations to lead you to an incorrect observation.

3: If the answer to question 1 is Yes how would I apply the answer to
question 2 so that I can figure my accuracy or probability of having a
correct answer to the question the recon planes went to answer.  
Example: After ten reports with 8 planes reporting the bridge 
destroyed and 2 reporting the bridge still intact, I have a 96% chance
of being correct if I conclude the bridge is destroyed. 

The above formulas allow you to apply the answer to any number of
planes. If you truly want to stay under 100 planes, then I've already
supplied all the results above.

In fact, if you have an accuracy target, you could look at the above
value table and pick out the exact number of planes you need to send.
For example, if you want to have at least a 99% chance of a correct
conclusion, you need to send out 13 planes
(p_correct_13 = .9929964388352). 

While I didn't use any search strategy to create this answer (using my
own experience instead), the search of:

://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=probability+tutorial

will bring up several good tutorials on the subject of probability.  

If you need any further clarifications, please ask before you rate
this answer.

Thanks. 

websearcher-ga

Request for Answer Clarification by myxlplix-ga on 15 Aug 2002 21:04 PDT
Thank you, I believe I'm well on my way to to understanding the
porblem and solution much better now. I wish to understand one thing a
little better:

The assumption that you used for ties:
(We'll assume here that if there is a tie - say 2
planes return correct observations and 2 planes return incorrect
observations out of 4 planes sent - then we count that ambiguous
result as an *incorrect* conclusion.)

In this situation, to follow along with my illustration, Lets say that
I recieve two reports for "Yes the bridge is destroyed" and two
reports for "No" the bridge is not destroyed"

If I understand your "incorrect conclusion" correctly you are saying
that I am not able to determine the accuracy of the any conclusion
simply because their is an equal amount of reports for each possible
conclusion. But as long as there is not an equal amount you can then
determine the probability of the conclusion

Clarification of Answer by websearcher-ga on 16 Aug 2002 06:15 PDT
Hi myxlplix-ga:

Understanding what happens when there is a "tie" is quite subtle. I
will try to explain it as best I can.

The value of p_correct_n gives you the probability that *any*
conclusion reached by sending n planes is correct.

For example, for 3 planes, that value is p_correct_n=.896. This means
that if you were to perform 1,000 tests with 3 planes each, (on
average) 896 times out of 1,000 your conclusion would be correct. The
other 104 times (on average), your conclusion would be incorrect.

When we have an even number of planes, the same basic logic hold true.
For example, for 4 planes, p_correct_4=.8192. This means that for
10,000 tests, (on average) 8,192 times out of 10,000 your conlcusion
would be correct. The other 1,808 times (on average) your conclusion
would be incorrect. However, when your test actually returns a tie (2
planes saying destroyed, 2 planes saying intact) then you *know* that
that particular conclusion is *one of* the incorrect conclusions. In
fact, you can also figure out the probability that this "tie"
situation will occur for any value of n.

p_tie_n = nC(n/2) * .8^(n/2) * .2^(n/2)

p_tie_4 = 4C2*(.8 * .8 * .2* .2) = .1536

So, if you wanted to you could break down the complete probability for
n=4 into:

* probability conclusion is correct   = .8192
* probability conclusion is incorrect = (1-.8192) = .1808
* probability you *know* an incorrect conclusion is incorrect = .1536
* probability you don't know an incorrect conclusion is incorrect =
(.1808-.1536) = .0272

None of this changes your overall probability of any single conlcusion
being correct. It merely changes whether you know when you've come to
an incorrect conclusion.

If you wanted to figure out a *different* probability - the
probability of a conclusion being correct when you don't know from the
results themselves whether it is or not (that is, when you don't have
a tie) - then you would compute:

p_correct_no_tie_n = p_correct_n/(1 - p_tie_n)

For n=4: 

p_correct_no_tie_4 = .8192/(1-.1536) = .9678638941

BUT KEEP IN MIND that this is a different probability, a different
question then you originally asked. This is NOT the probability that
any individual conclusion will be correct. Also realize that this type
of computation is only applicable when n is even.

I hope this clears it up for you - it's a tricky distinction!

websearcher-ga

Clarification of Answer by websearcher-ga on 16 Aug 2002 06:33 PDT
Hi myxlplix-ga:

poormattie-ga makes an excellent point in his comment below. It allows
us to eliminate the up-down nature of the figures that I provided by
stating that in the special case when there is a tie, that tie
conclusions have a 50/50 chance of being correct. Therefore, as
poormattie-ga shows, when n is even:

p_correct_n = p_correct_(n-1)

and that this means that there's no real point in sending an even
number of planes, just send the previous odd number and you'll get the
same result. (My analysis showed the same thing except it mistakenly
concluded that sending the even number would actually *decrease* the
accuracy from the previous even number, instead of keeping it equal.)

So, our new formulas would be:

If n is odd then: 
 
p_correct_n = sum( nC(n-i+1) * .8^(n-i+1) * .2^(i-1), i=1..(n+1)/2 )
 
If p is even then: 
 
p_correct_n = p_correct_(n-1)

And our new table of values would be:

                  "p_correct_1 =", .8
                  "p_correct_2 =", .8
                  "p_correct_3 =", .896
                  "p_correct_4 =", .896
                  "p_correct_5 =", .94208
                  "p_correct_6 =", .94208
                  "p_correct_7 =", .9666560
                  "p_correct_8 =", .9666560
                  "p_correct_9 =", .980418560
                  "p_correct_10 =", .980418560
                  "p_correct_11 =", .98834579456
                  "p_correct_12 =", .98834579456
                  "p_correct_13 =", .9929964388352
                  "p_correct_14 =", .9929964388352
                  "p_correct_15 =", .995760250290176
                  "p_correct_16 =", .995760250290176
                  "p_correct_17 =", .997418537163163
                  "p_correct_18 =", .997418537163163
                  "p_correct_19 =", .998420879450834
                  "p_correct_20 =", .998420879450834
                  "p_correct_21 =", .999030303561736
                  "p_correct_22 =", .999030303561736
                  "p_correct_23 =", .999402606291308
                  "p_correct_24 =", .999402606291308
                  "p_correct_25 =", .999630951965445
                  "p_correct_26 =", .999630951965445
                  "p_correct_27 =", .999771472380298
                  "p_correct_28 =", .999771472380298
                  "p_correct_29 =", .999858193550607
                  "p_correct_30 =", .999858193550607
                  "p_correct_31 =", .999911845047974
                  "p_correct_32 =", .999911845047974
                  "p_correct_33 =", .999945108976340
                  "p_correct_34 =", .999945108976340
                  "p_correct_35 =", .999965771745958
                  "p_correct_36 =", .999965771745958
                  "p_correct_37 =", .999978628580389
                  "p_correct_38 =", .999978628580389
                  "p_correct_39 =", .999986640418264
                  "p_correct_40 =", .999986640418264
                  "p_correct_41 =", .999991639805099
                  "p_correct_42 =", .999991639805099
                  "p_correct_43 =", .999994763231542
                  "p_correct_44 =", .999994763231542
                  "p_correct_45 =", .999996716792807
                  "p_correct_46 =", .999996716792807
                  "p_correct_47 =", .999997939892035
                  "p_correct_48 =", .999997939892035
                  "p_correct_49 =", .999998706367551
                  "p_correct_50 =", .999998706367551
                  "p_correct_51 =", .999999187100994
                  "p_correct_52 =", .999999187100994
                  "p_correct_53 =", .999999488853677
                  "p_correct_54 =", .999999488853677
                  "p_correct_55 =", .999999678399069
                  "p_correct_56 =", .999999678399069
                  "p_correct_57 =", .999999797541884
                  "p_correct_58 =", .999999797541884
                  "p_correct_59 =", .999999872478607
                  "p_correct_60 =", .999999872478607
                  "p_correct_61 =", .999999919638788
                  "p_correct_62 =", .999999919638788
                  "p_correct_63 =", .999999949334483
                  "p_correct_64 =", .999999949334483
                  "p_correct_65 =", .999999968042776
                  "p_correct_66 =", .999999968042776
                  "p_correct_67 =", .999999979834666
                  "p_correct_68 =", .999999979834666
                  "p_correct_69 =", .999999987270496
                  "p_correct_70 =", .999999987270496
                  "p_correct_71 =", .999999991961439
                  "p_correct_72 =", .999999991961439
                  "p_correct_73 =", .999999994921948
                  "p_correct_74 =", .999999994921948
                  "p_correct_75 =", .999999996791066
                  "p_correct_76 =", .999999996791066
                  "p_correct_77 =", .999999997971567
                  "p_correct_78 =", .999999997971567
                  "p_correct_79 =", .999999998717401
                  "p_correct_80 =", .999999998717401
                  "p_correct_81 =", .999999999188764
                  "p_correct_82 =", .999999999188764
                  "p_correct_83 =", .999999999486760
                  "p_correct_84 =", .999999999486760
                  "p_correct_85 =", .999999999675206
                  "p_correct_86 =", .999999999675206
                  "p_correct_87 =", .999999999794408
                  "p_correct_88 =", .999999999794408
                  "p_correct_89 =", .999999999869835
                  "p_correct_90 =", .999999999869835
                  "p_correct_91 =", .999999999917569
                  "p_correct_92 =", .999999999917569
                  "p_correct_93 =", .999999999947783
                  "p_correct_94 =", .999999999947783
                  "p_correct_95 =", .999999999966923
                  "p_correct_96 =", .999999999966923
                  "p_correct_97 =", .999999999979037
                  "p_correct_98 =", .999999999979037
                  "p_correct_99 =", .999999999986715
                 "p_correct_100 =", .999999999986715

Thanks to poormattie-ga for catching this subtle point! :-)

websearcher-ga
myxlplix-ga rated this answer:5 out of 5 stars
Thank you for the time and patience you put into the question. You had
assumed correctly when you thought I was a novice. Your answer took my
level of understanding into account and I will now be able to apply
this information to other similiar applications. Thank you for putting
in the time to write the complete answer and spawning more debate :)

Comments  
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: poormattie-ga on 15 Aug 2002 22:40 PDT
 
I do think it's mostly to the letter of his question to state/assume
that you need to throw-away an answer when no clear victor appears
(i.e 2 planes say yay and 2 say nay). But I feel that it needs to be
mentioned that really you have the same chance of having the correct
information for any 'even' answer as you did the 'odd' answer before
it (IOW, the probability never really goes back down).

The idea is that instead of counting it as an "incorrect" answer, you
drop one of the planes at random when you have a tie, reducing those
few "tie" cases to a 50/50 shot. Thus the combined odds of any "even"
case really carry at best/worst the odds of n-1 planes. Thus it keeps
the probability stair-stepping upwards (instead of such wild swing).

For example, with two planes, you have one possibility that is
completely correct, two possibilities of a tie that are reduced to
50/50, and one possibility that leads you completely astray.

So instead of 
  correct_choice2 = 1*(.8 * .8) = .64

You have
  correct_choice2 = 1*((1.0)*(.8 * .8) + 2*((0.5)*(.8 * .2)) +
1((0.0)*(.2*.2))
                  = .64 + .16 + .0
                  = .8
                  = correct_choice1

By dropping one random plane on a tie, you see that by sending more
planes, you can't ever go /back/ in odds (you just might not gain
anything by sending an even number of planes). Similarly, for four
planes you see:

  correct_choice4 = 1*(1.0)(.8^4) + 4*(1.0)(.8^3)(.2) +
6(0.5)(.8^2)(.2^2)
                  = .4096 + .4096 + .0768 
                  = .896
                  = correct_choice3

Again, I think your answer follows the letter of his question, but
since it's a Yes/No problem (the odds would be different if the info
coming back wasn't binary), you know that, at a minimum, a correct
answer can always be obtained 50% of the time.
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 00:02 PDT
 
The analysis is correct when the prior probability that the bridge is
destroyed is .5 . It is not correct otherwise.

Here is a simple example to see the problem. Suppose you believe that
there is a 1/1000 chance that the bridge is destroyed. (This is called
a prior). Then, when you get the report back from one plane, you
should still say "The bridge is destroyed" no matter what the plane
reports. You'll need a lot of planes even to consider reporting
"bridge not destroyed".

One might be tempted to say that since no prior is given, we can
assume that the prior is 0.5, that is, we can assume that the
probability of a bridge being destroyed is 0.5. However, this is a
fallacy: in the absence of a prior one cannot assume a uniform
distribution.

See e.g. 

http://www.itl.nist.gov/div898/handbook/apr/section1/apr1a.htm

Before a numeric answer to your question can be given you must answer
the question: If 0 planes report, what is the probability of the
bridge being destroyed.
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: poormattie-ga on 16 Aug 2002 06:35 PDT
 
rbnn, you come from a different probability world than I. Bayesian
thinking (in my studies at least) often relied on a uniform
distribution (all choices equally likely) when there is no prior
information. I would appreciate if you could provide a reference that
claims this as a fallacy, because there are other sources (in
beginning bayesian tutorials, no less) that use this approach.

But we're not concerned (in this problem, at least) what the specific
% chance that the bridge has been destroyed. All we want to know in
this case is whether the information (whatever it is) brought back by
the planes gives you a correct answer (or, in my alternative, whether
you can choose the correct answer from the information the planes
provide with no prior knowledge about their subject).
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 12:44 PDT
 
poormattie, one way to help understand the difficulty, suppose that
the reports that come back from a plane are:
   1. Bridge destroyed
   2. Bridge not destroyed and car on bridge.
   3. Bridge not destroyed and car not on bridge.

Now there are three states, but if we use a uniform prior, we have a
prior on bridge destroyed of 1/3, even though the probability of the
bridge being destroyed is the same.
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 20:02 PDT
 
Oops, forgot to mention, the analysis above is correct if the prior is
assumed to be .5 AND if the errors reported by each plane are
independent; but not otherwise.
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: tne-ga on 16 Aug 2002 20:50 PDT
 
Hi websearcher,

I do have some confusions.

According to my understanding you calculated
The probability my answer will be correct 
if I go with the majority.

How about in case of a tie I just pick randomly 
one side isn't my probability of correct answer .5

so should we not add to above 

.5 * probability of tie

Please, let me know if I missed something??

tne
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: websearcher-ga on 17 Aug 2002 11:32 PDT
 
Hi tne-ga:

The first comment by poormattie-ga above speaks to your point. I
corrected my formulas given poormattie-ga's input. :-)

websearcher
Subject: Re: Increase the probability of gaining a correct answer by taking more data samples
From: alephnull-ga on 22 Aug 2002 01:29 PDT
 
There is something wrong with the Google-researcher's answer. 
It asks you to hypothesize over all possible ways the data 
could have come out.  That seems wrong to me, anyway.

Another issue with the analysis is that it does not tell you
what to believe about the world, given your data.  What it 
does is tell you that there are many more ways to get consistent
data than inconsistent data.  But that is only an argument
in favour of "going with the majority."  Let me say it again:
it does not tell you what to believe about the world, given 
your data.

You asked:

> 1: Will I increase my chances of being accurate if I send more than 
>  one plane out? 

No.  Your accuracy is given.  It will not increase with more
observations.

(Sorry, that's not a facetious answer, but a literal one.
A problem with questions is that you have to ask the right one,
to get the answer you are looking for.)

Your question may have been: how many planes do I have to send out
to be reasonably sure that the majority report is the correct one?
If this is the case, the Google-researcher has addressed that question.
Well, almost.  The answer should have assumed that n is odd, so that
the problem of a split vote does not arise.  Would have saved some
trouble.

But I don't think that's the right question.  Why should you
side with the majority?  The reason you _can_ side with the
majority is because of the information that the data tells you
about the world.  The previous analysis does not address this
issue directly, but rather forms an argument about how many
ways the majority can be right.  Better, I think, to understand 
what the data tells you, rather than to worry about what data 
you might have seen.

If you want to know how to compute how the probability
of the true state of the world changes with the number of 
data points you collect, read on. 

I will not be able to tell you how many planes to send out.
But neither did the Google-answer.  My analysis will tell you
how "rational belief about the world" changes with the collection
of data.  The Google-answer only tells you the probability that
the majority of data points are consistent with the world.

Suppose you send one plane out.  It comes back with a datum, d1.
The probability that the datum is correct is given in the problem:
4/5.  The datum is an observation about the true state s of the 
world, but it does not bring back the actual state of the world.  
In other words, the plane is like a noisy sensor.

However, the question is: now that I have the datum, what is the 
probability that the state of the world is "s"?  In notation, we
want a formula (and eventually a number) for the quantity
represented by:

p(s|d1)

What we are given in the problem is 

p(d1|s)

To relate the two, we use Bayes' Theorem.  And as pointed out by
a previous comment, we will need a prior on the probability of the
state of the world.  And to answer an issue raised in one of the 
comments, no we should not assume a uniform prior without
good reason.  But one "good reason" would be "I'm willing to
assume this for the sake of argument."  With sufficient 
data, priors become unimportant anyway, as we will see.

To use the example given more specifically, let 
d1 represent the statement "the bridge is observed to be out"
and -d1 (that is "not d1") its contraposition.
Furthermore, let s represent the statement about the 
true state of the world: "the bridge is out", and likewise
-s (not s) represents "the bridge is not out."

Given:
p(d1|s) = 4/5
p(-d1|s) = 1/5

and also

p(d1|-s) = 1/5
p(-d1|-s) = 4/5

This symmetry is a very special case.  In general, a sensor may have
an asymmetric response to the world.  That's why car alarms can be 
heard making a terrible noise for any reason at all, but rarely if ever
fail to notice an honest break-in attempt.

Okay, so we want to know p(s|d1), that is, the probability
that the bridge is out, given the datum.

We use Bayes' theorem as follows:

p(s|d1) = p(d1|s)*p(s)
          ------------
             p(d1)

p(s) is the prior on the true state of the world.  It's the sticky 
point, mentioned above, but we won't get stuck just yet.  Let's 
assume only that there is such a probability, call it "t".  We can 
argue about what its value is, later.

p(d1|s) is just the given information for the accuracy of the report.

Now what about p(d1)?  It's just the probability of, regardless of
the true state of the world, the observation coming up as d1.  Its 
purpose is to rule out all the possibilities data that are not consistent 
with the actual datum.

How do we compute it?  Easy.  We "reason by cases."   Either the bridge
is out, or it isn't.  If it is out, then the probability the report 
is p(d1|s)*p(s); if the bridge is not out, the the probability of the 
report is p(d1|-s)*p(-s).  There is no other case to worry about, and 
so there is no "probability" outside of these two cases.

So p(d1) = p(d1|s)*p(s) + p(d1|-s)*p(-s)

(This is actually more concisely explained by appealing to the 
axioms of probability, but I didn't want to have to prove that 
many theorems this evening).

We end up with:

p(s|d1) =        p(d1|s)*p(s)
          -----------------------------
          p(d1|s)*p(s) + p(d1|-s)*p(-s)

Yes, the first term in the denominator is the same as the 
only term in the numerator.

This is as it should be.  This expression tells us the fraction of
"belief" that is consistent with the datum.  A Venn diagram usually
helps here; I hope you can draw one yourself for this situation.

Substituting numbers:

p(s|d1) =        4/5 * p(s)
          ------------------------
          4/5 * p(s) + 1/5 * p(-s)

Now we see the wisdom of using fractions: the 5s cancel, and we 
can use simple algebra to end up with:

p(s|d1) = 4 * t
          ---------
          3 * t + 1

where t = p(s), that prior we left as an open question.

Notice that if t is close to 1, then p(s|d1) is also close to 1;
likewise, if t is close to zero, p(s|d1) is also close to zero.  
But if t=1/2, then p(s|d1) = 4/5.  

This makes sense.  If you really believed that the bridge was out
prior to getting the datum, the datum wouldn't change your beliefs
very much.  However, if you thought the bridge was as likely out 
as not (t=1/2), then the report will change your belief a lot.

So much for one report.  What about the case where you have n
reports, n>1.  

Well, if we can assume that the reports are independent of each other
given the true state of the bridge, then we can compute the probability
of the data d1, ..., dn (n reports) given s by multiplying the
individual report probabilities together.

But here we must remember that some of the "di" are going to 
represent "bridge out" and the rest are "bridge not out," according
to the data brought back by the planes.

Suppose that k of the n reports were "out", and so (n-k) of the reports 
are "not out."  The probability of the data is just:

p(d1,...,dn|s) = p(d1|s)*p(d2|s)*...*p(dn|s)
               = (4/5)^k * (1/5)^(n-k)

This result uses our independence assumption, as well as the 
counting scheme for the reports (and "^" means exponentiation).

Now, as before, we want to know the probability of the true state 
of the world, given the data:

p(s|d1,...,dk) = p(d1,...,dn|s) * p(s)
                 ---------------------
                     p(d1,...,dn)

by Bayes' theorem.

We already know how to compute p(d1,...,dn|s).  And p(d1,...,dn) is
just a bit more complicated:

p(d1,...,dn) = p(d1,...,dn|s) * p(s) + p(d1,...,dn|-s) * p(-s)

The last piece of the derivation is the second term in the above:

p(d1,...,dn|-s) = p(d1|-s) * p(d2|-s) * ... * p(dk|-s)

as before.  This time however (because of the weird symmetry 
of the given information), we have the following result:

p(d1,...,dn|-s) = p(d1|-s) * p(d2|-s) * ... * p(dk|-s)
                = (1/5)^k * (4/5)^(n-k)

Remember that this quantity is hypothetical: it is the 
probability that the reports would come back as they did,
given that the bridge was not out.  Again, this kind
of reasoning is natural: we don't know about the bridge.
We do know about the data.  Why should we hypothesize about
what data we might have seen?  On the other hand, not knowing
about the bridge, it is natural to think about all the 
possibilities.

So finally:

p(s|d1,...,dk) =              (4/5)^k * (1/5)^(n-k) * t
                 -----------------------------------------------------
                 (4/5)^k * (1/5)^(n-k) * t + (1/5)^k * (4/5)^(n-k) * t

Again, the 5s cancel:

p(s|d1,...,dk) =        4^k * t
                 ------------------------
                 4^k * t + 4^(n-k) * (1-t)

This is less intuitive than the single datum case, but if you
squint a bit, you will see that as k increases towards its limit n, 
the evidence mounts towards the conclusion that the bridge is out.  
And vice versa.  Furthermore, it doesn't take too many data points 
to "wash out" the prior t: for any value of t, some number k of
reports will force the answer away from the prior, towards
a result that is based on observation, not prior biases.

Have I answered your question?  Not really.  I have not
told you how many planes to send out.  I do not think
that "accuracy" is at issue here, either.

But now you have seen a correct analysis of the probability
of the state of the world, given your observations of it.
You can use this probability to make rational decisions
concerning the bridge.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy