Google Answers Logo
View Question
 
Q: probability question ( Answered 4 out of 5 stars,   9 Comments )
Question  
Subject: probability question
Category: Science
Asked by: chris2002micrometer-ga
List Price: $15.00
Posted: 15 May 2003 17:13 PDT
Expires: 14 Jun 2003 17:13 PDT
Question ID: 204347
I teach and recently graded a test with 30 mult choice questions (a
thru d) where two students not only had the same score, but also the
same answers. Both missed seven with the same incorrect answers. My
gut feel was a one in 16384 (4 to the 7th) odds that this was
coincidental. How can I figure this out? What if the test had 100
questions? What if 5 incorrect answers matched but 2 were different?
Can I prove anything? I let it slide this time, but would like to know
if I should have said something.
Answer  
Subject: Re: probability question
Answered By: chis-ga on 15 May 2003 17:32 PDT
Rated:4 out of 5 stars
 
Hello chris2002micrometer,

It is impossible to find an exact probability of this occurring
becuase we don't know the likelihood of the students getting an answer
correct.  Here is how to find the probability of this occurring
naturally (in this approximation, I assume that the probability of a
correct choice is 80%, which you can modify):

[(30c7)(.8)^23(.2)^7]^2 = .02366 = 2.37%
Note that 30c7 = 30 choose 7 = 30!/((23!)*7!)

This is definitely very suspect as the probability is quite low.  Is
it possible that these seven questions were extremely hard?  Were
these kids sitting very close to each other?  Are they friends who
would likely cheat?

If the test had 100 questions, the probability would be:
[(100c7)(.8)^93(.2)^7]^2 = 1.99023*10^(-4) = .0199%

To figure these probabilities, I use the following:
[(Questions c questions the same)(prob. of getting it
right)^(questions right)(probability of getting it wrong)^(questions
wrong)]^2 (squared because it occurs twice)

For five incorrect answers, but two being different:
(30c7)(.8)^23(.2)^7 * (23c2)(.8)^21(.2)^2 * (7c5)(.8)^2(.2)^5 =
6.17496*10^(-5) = .0162%

As a result, no, you cannot truly prove anything because it IS
possible, but it is also extremely unlikely.  For the first time, it
was right to let it go, but if it occurs again, the probability would
be so low, that it would be nearly impossible for it to be a random
event.

Let me know if I can help any more.

Thanks,
chis-ga

Request for Answer Clarification by chris2002micrometer-ga on 15 May 2003 18:11 PDT
"This is definitely very suspect as the probability is quite low.  Is
it possible that these seven questions were extremely hard?  Were
these kids sitting very close to each other?  Are they friends who
would likely cheat?"
Both were very intent on achieving the best grade possible and they
did sit together but I saw nothing obvious.

"For five incorrect answers, but two being different: 
(30c7)(.8)^23(.2)^7 * (23c2)(.8)^21(.2)^2 * (7c5)(.8)^2(.2)^5 =
6.17496*10^(-5) = .0162%"

Wouldn't this be more probable than 7 erroneous matches?:

[(30c7)(.8)^23(.2)^7]^2 = .02366 = 2.37% 
Note that 30c7 = 30 choose 7 = 30!/((23!)*7!) 

I don't believe that the questions missed were harder but I did assume
they hadn't a clue to the correct answer (.25 rather than the .8
likelihood of choosing correctly). What do you think?

Clarification of Answer by chis-ga on 15 May 2003 18:28 PDT
Aha.  Very sorry about that...I've found my mistake.  Let me repost
the whole question with the corrections:

Hello chris2002micrometer, 
 
It is impossible to find an exact probability of this occurring
becuase we don't know the likelihood of the students getting an answer
correct.  Here is how to find the probability of this occurring
naturally (in this approximation, I assume that the probability of a
correct choice is 80%, which you can modify):
 
30c7[(.8)^23(.2)^7]^2 = 1.162*10^-8 
Note that 30c7 = 30 choose 7 = 30!/((23!)*7!) 
 
This is definitely very, very suspect as the probability is EXTREMELY
low.  (This would happen randomly 1/(the probability times) =
1/8604120 times.)  However, it is possible that these seven questions
were extremely hard, and this computation would not take that into
account. Since they were sitting together, it is almost certain that
there was some collusion here (though you'd think they could at least
get an 80% working TOGETHER!).
 
If the test had 100 questions, the probability would be: 
100c7[(.8)^93(.2)^7]^2 = 2.474*10^-18
 
To figure these probabilities, I use the following: 
Questions c questions the same[(prob. of getting it
right)^(questions right)(probability of getting it wrong)^(questions
wrong)]^2 (squared because it happens twice, and the questions c
questions the same is outside because we are only determining the 23
that they got right once, not twice).
 
For five incorrect answers, but two being different: 
(30c7)(.8)^23(.2)^7 * (23c2)(.8)^21(.2)^2 * (7c5)(.8)^2(.2)^5 =
6.17496*10^-5
 
As a result, no, you cannot truly prove anything because it IS
possible, but it is also extremely unlikely.  For the first time, it
may be right to let it go (and even better not to mention anything so
you can catch them next time).  You can be nearly certain that they
cheated, especially because they were sitting near to each other, and
if this happens again, it can essentially be a guarantee.

As for the probabilities always using the 80% right, consider this to
be the average that they would know.  (A better number to use may be
23/30 ~= 77%, which would slightly lower the probabilities).  You
could work out the probability more specifically by estimating how
well they would know each problem, but this would be quite tedious. 
The information I've given you already has shown that the probability
is tremendously low, and that their cheating is nearly certain.
 
Let me know if I can help any more. 
 
Thanks, 
chis-ga

Request for Answer Clarification by chris2002micrometer-ga on 16 May 2003 09:26 PDT
I am setting this up in excel to play with. I still think an important
component is missing. The number of correct answers shouldn't matter.
The number of matched-and-missed answers should be a factor.

Clarification of Answer by chis-ga on 16 May 2003 10:58 PDT
Here is a better explanation:

For missing the exact same questions: 
30c7[(.8)^23(.2)^7]^2 = 1.162*10^-8  

30c7 selects the 7 that the students missed
.8^23 is the probability of getting 23 correct
.2^7 is the probability of getting 7 incorrect
squared to account for both students doing the same thing
This accounts for any 7 of the same that both students missed (which,
in turn, accounts for the other 23 that they got correct).

For missing some of the exact same and some different:
(30c7)(.8)^23(.2)^7 * (23c2)(.8)^21(.2)^2 * (7c5)(.8)^2(.2)^5 = 
6.17496*10^-5 

30c7 selects the 7 that the first student missed
23c2 selects the 2 of the 23 that the second student missed (23
because they cannot be from the same group as the 7 that the first
missed)
7c5 selects 5 from the 7 that the first missed that are the same
as above, the probabilities work the same

I believe that this second equation should clear up your confusion
over what goes into the equation.  This includes both the
matched-and-missed and the number correct, both important factors.

Also, my math instructor told me that 10^-150 is the universal
probability that represents something being impossible.  This is not
quite there, so it is POSSIBLE, but still EXTREMELY unlikely.

Request for Answer Clarification by chris2002micrometer-ga on 16 May 2003 17:05 PDT
I think what makes this problem non-trivial is the variety of possible 

assumptions. Don't worry chis-ga, you have earned the 15 bucks. I 

appreciate the other points of view as well. Based on how I teach and 

put together tests, I would assume that: Given the 2 students knew the 

answers to 23 (because they were present and attentive when that info 

was divulged), I would omit the # correct factor. The ones missed were 

because they hadn't a clue due to tardiness or whatever (sorta like 

rolling 4-sided die). My initial "gut feel" model was (1/4)^7. If I 

administered the same test to a larger class the likelihood of this 

happening between any pair of students would, of course, go up. With 

the above assumptions about the "evenness" of the questions, how likely 

would it be to choose the same 7 incorrect choices. It should also be 

unlikely to choose 7 incorrect answers, none of which match. If they 

each miss 7 but 5 match (question# and answer) the 7 choose 5 factor 

would raise the p of this occurring. The model (7c5)(.25)^5 fails when 

5 is replaced by 2 or less. P > 1. What would be the most basic thing 

needed to make p peak around 3 or more matched misses?

Clarification of Answer by chis-ga on 17 May 2003 07:28 PDT
I'm not quite sure what you're asking, but I think you may be
inquiring about the (7c5)(.25)^5.  This alone is not a probability,
but rather part of one that needs the other parts to stay in line with
it.  Please clarify further because I'm not sure that I've answered
that satisfactorily.
chris2002micrometer-ga rated this answer:4 out of 5 stars
I got a lot of good insight into this from chis, et al. Worth every penny.

Comments  
Subject: Re: probability question
From: racecar-ga on 16 May 2003 12:20 PDT
 
I would tend to go about the calculation in a different way.
First, make a number of assumptions:

IF THERE WERE NO CHEATING:

1) The probability of getting each question correct is the same for
each question and each student and is 23/30.  This may be unfair to
the students, since, (a) as pointed out in the answer, some questions
may be more difficult than others, (b) the students may have studied
together, and so there may be a legitimate reason for correlation
between the answers they know.

2) The probability of choosing each of the three incorrect answers to
each question is the same, and is 7/90.  Thus if the correct answer is
B, the probabilities would be A: 7/90, B: 23/30, C: 7/90, and D: 7/90.
 Again, this may be unfair to the students, since some incorrect
answers may be more attractive than others.

3) You, the teacher, looked through all the exams and chose the two
most similar.

Having made these assumptions, the correct question to ask is:

GIVEN that one student chose the answers he did, what is the
probability IF NO CHEATING OCCURED that the other student would choose
the same answers.

The answer is:

The probability that the same 23 correct answers are chosen by the
second student is (23/30)^23.

The probability that the same 7 incorrect answers are chosen by the
second student is (7/90)^7.

So overall probablity is the product of these, and is 3.8 E -11 (call
this P), or 1 chance in 26 billion.  However, if there were N students
in the class, there were [N choose 2] (call this M) possible pairs,
each of which has a 1 in 26 billion chance of turning in identical
tests.  So the final answer is approximately PROBABILITY = M*P.  The
exact formula would be
   PROBABILITY = 1 - (1-P)^M,
since (1-P)^M is the probability that all M pairs turn in different
tests.  But for any realistic class size, this is very nearly the same
as M*P.

So, let's say your class size is 30.  M is then (30 choose 2), or 435,
and

PROBABILITY = M*P = 1.66 E -8 = 1 chance in 60 million.

As pointed out earlier, in reality the probability may be somewhat
higher, since the students may have had legitimate reasons for missing
some of the same questions (were the same incorrect answers chosen by
many of the other students?).  However, regardless of these
considerations, the probability is vanishingly small, and as they sat
next to each other, the opportunity to cheat was there.  The verdict
is that cheating occured. You may be as sure of it as you are of
anything else in your life, because from where I'm standing, the
probability that you're insane and dreamed the whole episode is more
than 1 in 60 million.  (Just a joke--but you get my point :) ).
Subject: Re: probability question
From: fstokens-ga on 16 May 2003 12:20 PDT
 
Of the other students who missed those 7 questions, did the other
students answer them in the same incorrect way?

For probability analysis, you generally assume that each wrong answer
is equally likely, but on multiple choice tests there are often some
wrong answers that are closer to the right answer than others.  If
most of the people in the class who got these answers wrong answered
them the same way, then there is no evidence that these two students
cheated (though there may be evidence for cheating on a wider scale!).
 On the other hand, if these 2 students answered those question wrong
in a different way than most other students who got them wrong, then
you have some strong evidence for collusion.
Subject: Re: probability question
From: racecar-ga on 16 May 2003 13:58 PDT
 
Just wanted to add what I think about the other cases you mentioned.

If the test had 100 questions, and the same 7 incorrect answers were
chosen by the students, P = (93/100)^93 * (7/100)^7 = 9.7 E -12 = 1 in
104 billion.  M is the same as before, and for 30 students,
PROBABILITY = 4.2 E -9 = 1 in 240 million.  These numbers are not
dramatically different from the 30 question case.

However, if you had indeed given a similar test, but with 100
questions, it is likely that the students in question would have
gotten about 77 right and 23 wrong.  If this had occured, and the same
23 incorrect responses had been chosen, P = (77/100)^77 * (23/300)^23
= 4.03 E -35 = 1 in 2.5 E 34.  So (for 30 students) PROBABILITY = 1.8
E -32, or 1 in 5.7 E 31.  This is about 1000 times less likely than
buying 4 super lotto tickets, and winning 4 jackpots in a row.

Now for the case with 30 questions, the same 7 of which are answered
incorrectly, with 5 of those 7 answered the same.

P = (23/30)^23 * (7/90)^5 * (21/90)^2 * (7 choose 5)

The first factor is for the correct answers, the second for the
incorrect ones which are answered the same, and the third for the
incorrect ones which are answered differently.  This third factor
would be (14/90)^2 if you wanted the probability that EXACTLY 5 of the
the incorrect answers matched, but I think it's more appropriate to
find the probability that AT LEAST 5 matched.  The fourth factor is to
account for the number of possible ways the second student could match
5 out of 7 of the other student's incorrect responses.

Thus P = 7.2 E -9 = 1 in 140 million, and
PROBABILITY (for 30 students) = 3.1 E -6 = 1 in 320,000.  

In this last case, it might almost be reasonable to chalk the
similarity up to freak coincidence and the fact that the students
studied together.  I'd still fail 'em though.  :)
Subject: Re: probability question
From: chris2002micrometer-ga on 16 May 2003 17:10 PDT
 
Racecar - Where did the 90 come from?
"2) The probability of choosing each of the three incorrect answers to
each question is the same, and is 7/90."
Subject: Re: probability question
From: racecar-ga on 16 May 2003 21:56 PDT
 
The probability 7/90 comes from the assumption that each of the three
incorrect answers has an equal probability of being chosen.  Given
that 23 of the 30 questions were answered correctly, I guessed that
the probability of a correct answer is 23/30.  The probability of an
incorrect answer is then 7/30.  There are three possible incorrect
answers to each question, so I assigned a probability of 7/90 to each
of them.
Subject: Re: probability question
From: racecar-ga on 16 May 2003 22:28 PDT
 
In regard to your clarification request:

It seems you want to know the answer to the following question: Given
that the students missed the same seven questions, and assuming the
choice on those seven was random, what is the probability that a given
number of answers will match?

This question can be answered fairly easily, but I don't think it
relates very well to the situation you described.  It is unlikely that
the students would know the answers to the same 23 questions.  I have
to go now, but I'll answer the above question later.
Subject: Re: probability question
From: racecar-ga on 16 May 2003 22:50 PDT
 
Ok, I'm back.

The probability that exactly N of the answers match is 

(7 choose N) * (1/4)^N * (3/4)^(7-N)

Here's a table of approximate values:

N         Prob.
0         .1335
1         .3115
2         .3115
3         .1730
4         .0577
5         .0115
6         .0013
7         .00006

Once again, I do not think this is a reasonable answer to the question
'given no cheating, what's the probability of 2 students choosing the
same 7 wrong answers out of 30 questions.  As you say, some of the
answers will be known, and some guessed, but why should both students
know the answers to the same questions?  If the missing knowledge were
due to 'tardiness or whatever' why should both students have been
'present and attentive' at all the same times?  It would be suspicious
that both students answered the same 23 questions correctly even if
they left the other 7 blank.

By the way, how many students are in the class?
Subject: Re: probability question
From: chris2002micrometer-ga on 17 May 2003 04:53 PDT
 
Racecar - 
* (3/4)^(7-N) - Yep, that's what I left out. There were only 4
students and the two in question missed the same lecture. The other
two students answered the 7 questions differently and more were
correct. I guess with all the assumptions made here and inside
knowledge, it is a dice-roll problem. Thanks for your input.
Subject: Re: probability question
From: khane96-ga on 23 May 2003 14:51 PDT
 
I do not think you have a probability problem here, even though probas
can help you solve it, logic is the way to go.

1) Let us admit they both cheated : 
Well the leat you could say is that they are not clever, if they did
work together by exchanging informations a smart way to do the thing
would be to pick a different answer for every question they were not
sure of. Furthermore it is hard to believe that they did agree with
one another on every question they got right. They got 4/5 of answers
right, meaning that combinig their informations they have an way above
average knowledge for the test. It is rare that you fail to give the
good answer while being absolutly sure you are right, especially 7
times in a row. So if they were working together, we can assume that
they would have hesitated on at leat one of the questions, then the
smart thing to do is to pick an answer randomly for one of the two and
to let the other choose a different answer. So working together and
being smart about cheating all the wrong answers would have been
different.
so if they worked together : 
-They were smart enough to get 4/5 of the test right. 
-they were smart enough not to get caught while cheating
-they were to dumb about cheating to choose different answers on
question they weren't sure about.
That seems really unlikely. 

2)let us admit they did not cheat 

Questions answered right doesn't bring any information, no matter how
suspicious it looks. The reason is knowing the answer gives you a 100%
probability of answering right. No matter if you know the answer by
yourself, if you bindly copy the answer from someone who knows, take
it from the book or else.
So let us foccus on answers that were wrong. Since they did not cheat,
anyone of them has a fair knowledge on the subjects th etest is about.
This will most probably allow them to exclude 1 answer from the
list(unless the test requires no logic at all and is pure "by heart").
So basically let us say they have 1/3 to go wrong on questions they do
not know about for sure(quite optimistic choice).
So knowing that, statistically how many unsure questions do you need
to answer to get 7 wrongs ?
7 questions 7 wrongs : p=(2/3)^7 (=0.06)
8 questions 7 wrongs : p=((2/3)^7x1/3)x8 (=0.16)
9 questions 7 wrongs : p=((2/3)^7x(1/3)^2)x9x8/2 (=0.23)
10 questions 7 wrongs : p=((2/3)^7x(1/3)^3)x10x9x8/6 (=0.26)
11 q 7 w : p=0.23
12 q 7 w : p=0.19
13 q 7 w : p=0.13

So let us assume that someone that got 7 answers wrong was actually
hesitating on 9, 10 or 11 answers (very optimistic choice)

So what are the probabilities that two students hestating on 9, 10 or
11 answers gets the same answers (still keeping in mind that one
answer is obviously wrong each time.)?.

for 9 answers there are 3^9 possibilities (19683)
for 10 answers : 59049
for 11 answers : 177147

so even while being really optimist there is still only about one
chance out of 60,000 that they did not cheat. Very unlikely

3) One of them cheated, and the other either didn't realize and/or did
not interfere with his neighboor in any way.
Well that is really, really probable. And it would explain everything,
the cheater not having the smallest clue about what was wrong or what
was right simply did copy whatever he saw without question, hence the
similarities going as far as mistakes.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy