![]() |
|
,
8 Comments
)
|
| Subject:
Increase the probability of gaining a correct answer by taking more data samples
Category: Science > Math Asked by: myxlplix-ga List Price: $20.00 |
Posted:
15 Aug 2002 16:47 PDT
Expires: 14 Sep 2002 16:47 PDT Question ID: 55038 |
Here is the problem: I have one hundred recon planes that have a 80% chance each of giving me an accurate report. I have designated a target and send one recon plane to give me a report. The report returned is a positive or negative answer(Yes\No). Examples would be: Yes the bridge is destroyed or No the bridge is not destroyed. This is what I want in the answer: 1: Will I increase my chances of being accurate if I send more than one plane out? 2: What is the reasoning behind the answer. 3: If the answer to question 1 is Yes how would I apply the answer to question 2 so that I can figure my accuracy or probability of having a correct answer to the question the recon planes went to answer. Example: After ten reports with 8 planes reporting the bridge destroyed and 2 reporting the bridge still intact, I have a 96% chance of being correct if I conclude the bridge is destroyed. Please ask for clarification before answering if you have any doubts about what I am looking for. I want to be able to apply the information that you give so you should expect me to ask clarification questions if I have any doubts on how to apply the formula. |
|
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
Answered By: websearcher-ga on 15 Aug 2002 20:21 PDT Rated: ![]() |
Hi myxlplix-ga :
This is an interesting probability question. I will assume that you
are a novice in this area, so please forgive me if I explain something
that's obvious to you. :-)
Let's start with what you know.
* You have 100 planes.
* Each plane has an 80% chance (or a .8 probability) of returning a
correct observation and a .2 probability of returning an incorrect
observation.
Figuring out the probability that n planes return a correct conclusion
overall is simply a matter of adding up the probabilities for each
possible combination of individual observations that leads to a
correct conclusion. (We'll assume here that if there is a tie - say 2
planes return correct observations and 2 planes return incorrect
observations out of 4 planes sent - then we count that ambiguous
result as an *incorrect* conclusion.)
We'll call our probability of a correct conclusion p_correct_n.
What happens for 1 plane? There's only one possible scenario for a
correct conclusion in this case - the one plane returns a correct
observation.
p_correct_1 = 1*(.8) = .8
What about for n=2? The only correct scenario is when both planes
return a correct conclusion. This can only happen in one way - both
plane_1 and plane_2 return correct observations.
p_correct_2 = 1*(.8 * .8) = .64
What about n=3? There are two correct scenarios here - either all
three planes return correct conclusions or two of three planes return
correct observations and the other plane returns an incorrect
observation. However, the second scenario (2 out of 3 correct) can
happen in 3 separate ways - plane_1 and plane_2 correct, plane_1 and
plane_2 correct, or plane_2 and plane_3 correct.
p_correct_3 = 1*(.8 * .8 * .8) + 3*(.8 * .8 * .2) = .896
What about n=4? There are two correct scenarios here as well - either
all four planes return correct observations or three out of four
planes return correct observations and the other returns an incorrect
observation. The second scenario can happen in 4 separate ways -
planes_1,2,3, planes_1,2,4, planes_1,3,4, or planes_2,3,4.
p_correct_4 = 1*(.8 * .8 * .8 * .8) + 4*(.8 * .8 * .8 * .2) = .8192
So, the probabilities seem to be going up and down, but in an overall
upward trend.
What are the general formulas for p_correct_n?
If n is odd then:
p_correct_n = sum( nC(n-i+1) * .8^(n-i+1) * .2^(i-1), i=1..(n+1)/2 )
If p is even then:
p_correct_n = sum( nC(n-i+1) * .8^(n-i+1) * .2^(i-1), i=1..n/2 )
where:
* sum(...) is the Sigma (summation) function where all the appropriate
values over the range of i are added together
* nCi is "n choose i", or the number of separate ways that i items can
be chosen from n items. This can be computed with the formula:
nCi = n!/(i!*(n-i)!
The ! (factorial) operator is calculated by multiplying all the
positive intergers equal to or less than the given value together. For
example, 6! = 6*5*4*3*2*1 = 720. (Also, 0! = 1)
* The "^" character represents exponentiation. For example, 2^3 =
2*2*2 = 8.
So, re-evaluating for n=1,2,3,4 gives:
p_correct_1 = sum(1C(2-i) * .8^(2-i) * .2^(i-1), i=1..1)
= 1!/(1!*0!) * .8^1 * .2^0
= 1 * .8 * 1
= .8
p_correct_2 = sum(2C(3-i) * .8^(3-i) * .2^(i-1), i=1..1)
= 2!/(2!*0!) * .8^2 * .2^0
= 1 * .64 * 1
= .64
p_correct_3 = sum(3C(4-i) * .8^(4-i) * .2^(i-1), i=1..2)
= 3!/(3!*0!) * .8^3 * .2^0 + 3!/(2!*1!) * .8^2 * .2^1
= 1 * .512 * 1 + 3 * .64 * .2
= .896
p_correct_4 = sum(4C(5-i) * .8^(5-i) * .2^(i-1), i=1..2)
= 4!/(4!*0!) * .8^4 * .2^0 + 4!/(3!*1!) * .8^3 * .2^1
= 1 * .4096 * 1 + 4 * .512 * .2
= .8192
These values match our earlier results.
Using a computer math program called Maple (http://www.maplesoft.com),
I computed the values for all values of n from 1 to 100.
"p_correct_1 =", .8
"p_correct_2 =", .64
"p_correct_3 =", .896
"p_correct_4 =", .8192
"p_correct_5 =", .94208
"p_correct_6 =", .901120
"p_correct_7 =", .9666560
"p_correct_8 =", .94371840
"p_correct_9 =", .980418560
"p_correct_10 =", .9672065024
"p_correct_11 =", .98834579456
"p_correct_12 =", .980594720768
"p_correct_13 =", .9929964388352
"p_correct_14 =", .98839008641024
"p_correct_15 =", .995760250290176
"p_correct_16 =", .992996438835200
"p_correct_17 =", .997418537163163
"p_correct_18 =", .995747966683708
"p_correct_19 =", .998420879450834
"p_correct_20 =", .997405172599325
"p_correct_21 =", .999030303561736
"p_correct_22 =", .998409799012455
"p_correct_23 =", .999402606291308
"p_correct_24 =", .999022030167748
"p_correct_25 =", .999630951965445
"p_correct_26 =", .999396751274023
"p_correct_27 =", .999771472380298
"p_correct_28 =", .999626937096448
"p_correct_29 =", .999858193550607
"p_correct_30 =", .999768774388334
"p_correct_31 =", .999911845047974
"p_correct_32 =", .999856405167363
"p_correct_33 =", .999945108976340
"p_correct_34 =", .999910671026972
"p_correct_35 =", .999965771745958
"p_correct_36 =", .999944343688577
"p_correct_37 =", .999978628580389
"p_correct_38 =", .999965275517262
"p_correct_39 =", .999986640418264
"p_correct_40 =", .999978308106872
"p_correct_41 =", .999991639805099
"p_correct_42 =", .999986434094364
"p_correct_43 =", .999994763231542
"p_correct_44 =", .999991507296101
"p_correct_45 =", .999996716792807
"p_correct_46 =", .999994678294096
"p_correct_47 =", .999997939892035
"p_correct_48 =", .999996662432841
"p_correct_49 =", .999998706367551
"p_correct_50 =", .999997905145145
"p_correct_51 =", .999999187100994
"p_correct_52 =", .999998684179852
"p_correct_53 =", .999999488853677
"p_correct_54 =", .999999172944694
"p_correct_55 =", .999999678399069
"p_correct_56 =", .999999479827707
"p_correct_57 =", .999999797541884
"p_correct_58 =", .999999672647345
"p_correct_59 =", .999999872478607
"p_correct_60 =", .999999793878312
"p_correct_61 =", .999999919638788
"p_correct_62 =", .999999870145956
"p_correct_63 =", .999999949334483
"p_correct_64 =", .999999918154003
"p_correct_65 =", .999999968042776
"p_correct_66 =", .999999948389620
"p_correct_67 =", .999999979834666
"p_correct_68 =", .999999967441619
"p_correct_69 =", .999999987270496
"p_correct_70 =", .999999979452252
"p_correct_71 =", .999999991961439
"p_correct_72 =", .999999987027258
"p_correct_73 =", .999999994921948
"p_correct_74 =", .999999991806745
"p_correct_75 =", .999999996791066
"p_correct_76 =", .999999994823574
"p_correct_77 =", .999999997971567
"p_correct_78 =", .999999996728511
"p_correct_79 =", .999999998717401
"p_correct_80 =", .999999997931787
"p_correct_81 =", .999999999188764
"p_correct_82 =", .999999998692106
"p_correct_83 =", .999999999486760
"p_correct_84 =", .999999999172684
"p_correct_85 =", .999999999675206
"p_correct_86 =", .999999999476536
"p_correct_87 =", .999999999794408
"p_correct_88 =", .999999999668702
"p_correct_89 =", .999999999869835
"p_correct_90 =", .999999999790276
"p_correct_91 =", .999999999917569
"p_correct_92 =", .999999999867204
"p_correct_93 =", .999999999947783
"p_correct_94 =", .999999999915896
"p_correct_95 =", .999999999966923
"p_correct_96 =", .999999999946724
"p_correct_97 =", .999999999979037
"p_correct_98 =", .999999999966244
"p_correct_99 =", .999999999986715
"p_correct_100 =", .999999999978608
Now to answer your original questions:
1: Will I increase my chances of being accurate if I send more than
one plane out?
Yes, as you can see, the trend of the results definitely gets more
accurate the more planes you send out. However, because of the assumed
rule for ties, p_correct_n is a little less accurate for each even
value of n than it is for the previous odd value of n.
2: What is the reasoning behind the answer.
I've explained the mathematical reasoning above. :-)
In plainer terms, because the planes individually are so accurate
(80%), the more you send out, the less likely you are to get enough
inaccurate observations to lead you to an incorrect observation.
3: If the answer to question 1 is Yes how would I apply the answer to
question 2 so that I can figure my accuracy or probability of having a
correct answer to the question the recon planes went to answer.
Example: After ten reports with 8 planes reporting the bridge
destroyed and 2 reporting the bridge still intact, I have a 96% chance
of being correct if I conclude the bridge is destroyed.
The above formulas allow you to apply the answer to any number of
planes. If you truly want to stay under 100 planes, then I've already
supplied all the results above.
In fact, if you have an accuracy target, you could look at the above
value table and pick out the exact number of planes you need to send.
For example, if you want to have at least a 99% chance of a correct
conclusion, you need to send out 13 planes
(p_correct_13 = .9929964388352).
While I didn't use any search strategy to create this answer (using my
own experience instead), the search of:
://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=probability+tutorial
will bring up several good tutorials on the subject of probability.
If you need any further clarifications, please ask before you rate
this answer.
Thanks.
websearcher-ga | |
| |
| |
| |
myxlplix-ga
rated this answer:
Thank you for the time and patience you put into the question. You had assumed correctly when you thought I was a novice. Your answer took my level of understanding into account and I will now be able to apply this information to other similiar applications. Thank you for putting in the time to write the complete answer and spawning more debate :) |
|
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: poormattie-ga on 15 Aug 2002 22:40 PDT |
I do think it's mostly to the letter of his question to state/assume
that you need to throw-away an answer when no clear victor appears
(i.e 2 planes say yay and 2 say nay). But I feel that it needs to be
mentioned that really you have the same chance of having the correct
information for any 'even' answer as you did the 'odd' answer before
it (IOW, the probability never really goes back down).
The idea is that instead of counting it as an "incorrect" answer, you
drop one of the planes at random when you have a tie, reducing those
few "tie" cases to a 50/50 shot. Thus the combined odds of any "even"
case really carry at best/worst the odds of n-1 planes. Thus it keeps
the probability stair-stepping upwards (instead of such wild swing).
For example, with two planes, you have one possibility that is
completely correct, two possibilities of a tie that are reduced to
50/50, and one possibility that leads you completely astray.
So instead of
correct_choice2 = 1*(.8 * .8) = .64
You have
correct_choice2 = 1*((1.0)*(.8 * .8) + 2*((0.5)*(.8 * .2)) +
1((0.0)*(.2*.2))
= .64 + .16 + .0
= .8
= correct_choice1
By dropping one random plane on a tie, you see that by sending more
planes, you can't ever go /back/ in odds (you just might not gain
anything by sending an even number of planes). Similarly, for four
planes you see:
correct_choice4 = 1*(1.0)(.8^4) + 4*(1.0)(.8^3)(.2) +
6(0.5)(.8^2)(.2^2)
= .4096 + .4096 + .0768
= .896
= correct_choice3
Again, I think your answer follows the letter of his question, but
since it's a Yes/No problem (the odds would be different if the info
coming back wasn't binary), you know that, at a minimum, a correct
answer can always be obtained 50% of the time. |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 00:02 PDT |
The analysis is correct when the prior probability that the bridge is destroyed is .5 . It is not correct otherwise. Here is a simple example to see the problem. Suppose you believe that there is a 1/1000 chance that the bridge is destroyed. (This is called a prior). Then, when you get the report back from one plane, you should still say "The bridge is destroyed" no matter what the plane reports. You'll need a lot of planes even to consider reporting "bridge not destroyed". One might be tempted to say that since no prior is given, we can assume that the prior is 0.5, that is, we can assume that the probability of a bridge being destroyed is 0.5. However, this is a fallacy: in the absence of a prior one cannot assume a uniform distribution. See e.g. http://www.itl.nist.gov/div898/handbook/apr/section1/apr1a.htm Before a numeric answer to your question can be given you must answer the question: If 0 planes report, what is the probability of the bridge being destroyed. |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: poormattie-ga on 16 Aug 2002 06:35 PDT |
rbnn, you come from a different probability world than I. Bayesian thinking (in my studies at least) often relied on a uniform distribution (all choices equally likely) when there is no prior information. I would appreciate if you could provide a reference that claims this as a fallacy, because there are other sources (in beginning bayesian tutorials, no less) that use this approach. But we're not concerned (in this problem, at least) what the specific % chance that the bridge has been destroyed. All we want to know in this case is whether the information (whatever it is) brought back by the planes gives you a correct answer (or, in my alternative, whether you can choose the correct answer from the information the planes provide with no prior knowledge about their subject). |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 12:44 PDT |
poormattie, one way to help understand the difficulty, suppose that the reports that come back from a plane are: 1. Bridge destroyed 2. Bridge not destroyed and car on bridge. 3. Bridge not destroyed and car not on bridge. Now there are three states, but if we use a uniform prior, we have a prior on bridge destroyed of 1/3, even though the probability of the bridge being destroyed is the same. |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: rbnn-ga on 16 Aug 2002 20:02 PDT |
Oops, forgot to mention, the analysis above is correct if the prior is assumed to be .5 AND if the errors reported by each plane are independent; but not otherwise. |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: tne-ga on 16 Aug 2002 20:50 PDT |
Hi websearcher, I do have some confusions. According to my understanding you calculated The probability my answer will be correct if I go with the majority. How about in case of a tie I just pick randomly one side isn't my probability of correct answer .5 so should we not add to above .5 * probability of tie Please, let me know if I missed something?? tne |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: websearcher-ga on 17 Aug 2002 11:32 PDT |
Hi tne-ga: The first comment by poormattie-ga above speaks to your point. I corrected my formulas given poormattie-ga's input. :-) websearcher |
| Subject:
Re: Increase the probability of gaining a correct answer by taking more data samples
From: alephnull-ga on 22 Aug 2002 01:29 PDT |
There is something wrong with the Google-researcher's answer.
It asks you to hypothesize over all possible ways the data
could have come out. That seems wrong to me, anyway.
Another issue with the analysis is that it does not tell you
what to believe about the world, given your data. What it
does is tell you that there are many more ways to get consistent
data than inconsistent data. But that is only an argument
in favour of "going with the majority." Let me say it again:
it does not tell you what to believe about the world, given
your data.
You asked:
> 1: Will I increase my chances of being accurate if I send more than
> one plane out?
No. Your accuracy is given. It will not increase with more
observations.
(Sorry, that's not a facetious answer, but a literal one.
A problem with questions is that you have to ask the right one,
to get the answer you are looking for.)
Your question may have been: how many planes do I have to send out
to be reasonably sure that the majority report is the correct one?
If this is the case, the Google-researcher has addressed that question.
Well, almost. The answer should have assumed that n is odd, so that
the problem of a split vote does not arise. Would have saved some
trouble.
But I don't think that's the right question. Why should you
side with the majority? The reason you _can_ side with the
majority is because of the information that the data tells you
about the world. The previous analysis does not address this
issue directly, but rather forms an argument about how many
ways the majority can be right. Better, I think, to understand
what the data tells you, rather than to worry about what data
you might have seen.
If you want to know how to compute how the probability
of the true state of the world changes with the number of
data points you collect, read on.
I will not be able to tell you how many planes to send out.
But neither did the Google-answer. My analysis will tell you
how "rational belief about the world" changes with the collection
of data. The Google-answer only tells you the probability that
the majority of data points are consistent with the world.
Suppose you send one plane out. It comes back with a datum, d1.
The probability that the datum is correct is given in the problem:
4/5. The datum is an observation about the true state s of the
world, but it does not bring back the actual state of the world.
In other words, the plane is like a noisy sensor.
However, the question is: now that I have the datum, what is the
probability that the state of the world is "s"? In notation, we
want a formula (and eventually a number) for the quantity
represented by:
p(s|d1)
What we are given in the problem is
p(d1|s)
To relate the two, we use Bayes' Theorem. And as pointed out by
a previous comment, we will need a prior on the probability of the
state of the world. And to answer an issue raised in one of the
comments, no we should not assume a uniform prior without
good reason. But one "good reason" would be "I'm willing to
assume this for the sake of argument." With sufficient
data, priors become unimportant anyway, as we will see.
To use the example given more specifically, let
d1 represent the statement "the bridge is observed to be out"
and -d1 (that is "not d1") its contraposition.
Furthermore, let s represent the statement about the
true state of the world: "the bridge is out", and likewise
-s (not s) represents "the bridge is not out."
Given:
p(d1|s) = 4/5
p(-d1|s) = 1/5
and also
p(d1|-s) = 1/5
p(-d1|-s) = 4/5
This symmetry is a very special case. In general, a sensor may have
an asymmetric response to the world. That's why car alarms can be
heard making a terrible noise for any reason at all, but rarely if ever
fail to notice an honest break-in attempt.
Okay, so we want to know p(s|d1), that is, the probability
that the bridge is out, given the datum.
We use Bayes' theorem as follows:
p(s|d1) = p(d1|s)*p(s)
------------
p(d1)
p(s) is the prior on the true state of the world. It's the sticky
point, mentioned above, but we won't get stuck just yet. Let's
assume only that there is such a probability, call it "t". We can
argue about what its value is, later.
p(d1|s) is just the given information for the accuracy of the report.
Now what about p(d1)? It's just the probability of, regardless of
the true state of the world, the observation coming up as d1. Its
purpose is to rule out all the possibilities data that are not consistent
with the actual datum.
How do we compute it? Easy. We "reason by cases." Either the bridge
is out, or it isn't. If it is out, then the probability the report
is p(d1|s)*p(s); if the bridge is not out, the the probability of the
report is p(d1|-s)*p(-s). There is no other case to worry about, and
so there is no "probability" outside of these two cases.
So p(d1) = p(d1|s)*p(s) + p(d1|-s)*p(-s)
(This is actually more concisely explained by appealing to the
axioms of probability, but I didn't want to have to prove that
many theorems this evening).
We end up with:
p(s|d1) = p(d1|s)*p(s)
-----------------------------
p(d1|s)*p(s) + p(d1|-s)*p(-s)
Yes, the first term in the denominator is the same as the
only term in the numerator.
This is as it should be. This expression tells us the fraction of
"belief" that is consistent with the datum. A Venn diagram usually
helps here; I hope you can draw one yourself for this situation.
Substituting numbers:
p(s|d1) = 4/5 * p(s)
------------------------
4/5 * p(s) + 1/5 * p(-s)
Now we see the wisdom of using fractions: the 5s cancel, and we
can use simple algebra to end up with:
p(s|d1) = 4 * t
---------
3 * t + 1
where t = p(s), that prior we left as an open question.
Notice that if t is close to 1, then p(s|d1) is also close to 1;
likewise, if t is close to zero, p(s|d1) is also close to zero.
But if t=1/2, then p(s|d1) = 4/5.
This makes sense. If you really believed that the bridge was out
prior to getting the datum, the datum wouldn't change your beliefs
very much. However, if you thought the bridge was as likely out
as not (t=1/2), then the report will change your belief a lot.
So much for one report. What about the case where you have n
reports, n>1.
Well, if we can assume that the reports are independent of each other
given the true state of the bridge, then we can compute the probability
of the data d1, ..., dn (n reports) given s by multiplying the
individual report probabilities together.
But here we must remember that some of the "di" are going to
represent "bridge out" and the rest are "bridge not out," according
to the data brought back by the planes.
Suppose that k of the n reports were "out", and so (n-k) of the reports
are "not out." The probability of the data is just:
p(d1,...,dn|s) = p(d1|s)*p(d2|s)*...*p(dn|s)
= (4/5)^k * (1/5)^(n-k)
This result uses our independence assumption, as well as the
counting scheme for the reports (and "^" means exponentiation).
Now, as before, we want to know the probability of the true state
of the world, given the data:
p(s|d1,...,dk) = p(d1,...,dn|s) * p(s)
---------------------
p(d1,...,dn)
by Bayes' theorem.
We already know how to compute p(d1,...,dn|s). And p(d1,...,dn) is
just a bit more complicated:
p(d1,...,dn) = p(d1,...,dn|s) * p(s) + p(d1,...,dn|-s) * p(-s)
The last piece of the derivation is the second term in the above:
p(d1,...,dn|-s) = p(d1|-s) * p(d2|-s) * ... * p(dk|-s)
as before. This time however (because of the weird symmetry
of the given information), we have the following result:
p(d1,...,dn|-s) = p(d1|-s) * p(d2|-s) * ... * p(dk|-s)
= (1/5)^k * (4/5)^(n-k)
Remember that this quantity is hypothetical: it is the
probability that the reports would come back as they did,
given that the bridge was not out. Again, this kind
of reasoning is natural: we don't know about the bridge.
We do know about the data. Why should we hypothesize about
what data we might have seen? On the other hand, not knowing
about the bridge, it is natural to think about all the
possibilities.
So finally:
p(s|d1,...,dk) = (4/5)^k * (1/5)^(n-k) * t
-----------------------------------------------------
(4/5)^k * (1/5)^(n-k) * t + (1/5)^k * (4/5)^(n-k) * t
Again, the 5s cancel:
p(s|d1,...,dk) = 4^k * t
------------------------
4^k * t + 4^(n-k) * (1-t)
This is less intuitive than the single datum case, but if you
squint a bit, you will see that as k increases towards its limit n,
the evidence mounts towards the conclusion that the bridge is out.
And vice versa. Furthermore, it doesn't take too many data points
to "wash out" the prior t: for any value of t, some number k of
reports will force the answer away from the prior, towards
a result that is based on observation, not prior biases.
Have I answered your question? Not really. I have not
told you how many planes to send out. I do not think
that "accuracy" is at issue here, either.
But now you have seen a correct analysis of the probability
of the state of the world, given your observations of it.
You can use this probability to make rational decisions
concerning the bridge. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
| Search Google Answers for |
| Google Home - Answers FAQ - Terms of Service - Privacy Policy |