Hi probing!
Here are the answers to your questions.
Question 1
In order to answer this and some of the following questions, we'll
need to make use of the Central Limit Theorem:
"The central limit theorem states that given a distribution with a
mean m and variance s2, the sampling distribution of the mean
approaches a normal distribution with a mean (m) and a variance s2/N
as N, the sample size, increases."
Central Limit Theorem
http://davidmlane.com/hyperstat/A14043.html
Sample sizes larger than 30 can usually be considered large enough for
the sample mean to have an almost normal distribution. So the sample
size of 141 in this question is more than enough to use this theorem.
Let's call the sample mean X, the population mean m, and the sample
variance s2 (so the sample standard deviation is s). Let's also call n
the sample size. In order to build a 90% confidence interval around
the sample mean, we need to find a number 'a' such that:
Prob(m < X-a) = 0.05
and
Prob(m > X+a) = 0.05
The intuition behind this is that we want to find 'a' such that the
probability that m is outside the interval [X-a,X+a] is 0.10. Now,
since X follows a normal distribution (approximately), which is
symmetric around its mean, it turns out that you can use either
equation to calculate 'a' and you will get the same result. Let's use
the first one and rearrange it a little bit:
Prob(m < X-a)
=Prob(X-m>a)
=Prob( (X-m)/(s/sqrt(n)) > a/(s/sqrt(n)) )
where sqrt means "square root". Why write it like this? Because now
the left-hand side of the inequality is known to follow a
t-distribution, for which have probability tables. Notice that its
just X minus its mean (m) divided by its standard deviation (recall
from the Central Limit Theorem that the sample mean has variance s2/n)
Student's t distribution
http://mathworld.wolfram.com/Studentst-Distribution.html
In particular, it follows a t distribution with n-1 (140) degrees of
freedom. Using that sqrt(140)=11.83, we have to solve the equation:
Prob( t(140) > a/(s/sqrt(n)) )
=Prob( t(140) > a/(68.2/11.83) ) = 0.05
Looking up in a t distribution table
T-Distribution table
http://www.stat.ucla.edu/~dinov/courses_students.dir/Applets.dir/T-table.html
we find that
Prob( t(140) > 1.645 ) = 0.05
[Please request clarification if you don't understand how to use this table]
So now all we have to do is solve:
a/(68.2/11.83) = 2.576
which yields
a = 14.85
Therefore, a 90% confidence interval for the population mean is the interval:
[106.9-14.85 , 106.9+14.85]
=[92.05 , 121.75]
Question 2
Let's call 'm' to the population mean of annual consumption of beer
per person in Washington. Here, the null and alternatuive hypothesis
can be written in the following way:
Ho : m=22
Ha : m>22
Call X the sample mean, n the population size, and s the standard
deviation. In a similar fashion as the previous question, we need to
find a value 'a' such that:
Prob( X > 22+a ) = 0.01
Prob( X-22 > a ) = 0.01
Once we have obtained 'a' (we'll see how to do that next), we'll
compare it to the actual difference between Washington's sample mean
and the US mean (which is 27.52-22=5.52). If the actual difference is
greater than 'a', we'll reject the hypothesis that Washington's mean
is 22 in favor of the alternative hypothesis that it's larger than 22.
The intuition is that is 5.52 is larger than 'a', then 5.52 is "too
large" a difference with the US mean to assume that Washington has the
same population mean as the US.
Now, in order to find 'a', we follow the same steps we used in the
previous question. We rewrite:
Prob( X-22 > a ) = 0.01
Prob( (X-22)/(s/sqrt(n)) > a/(s/sqrt(n)) ) = 0.01
Again, the left hand side of this equation has a t distribution, for
the same reasons discussed in the previous question. In this case, it
has a t distribution with 299 degrees of freedom (the sample size here
is 300). So:
Prob( t(299) > a/(19.426/sqrt(300)) ) = 0.01
Again, we use the t table exactly as before, obtaining that:
a/(19.426/sqrt(300)) = 2.326
a = 2.608
Finally, using hte reasoning explained above, since 5.52 is greater
than 2.608, we have evidence that the mean annual consumption of beer
per person in Washington is greater than the 22 gallons national
average.
Question 3 and 4
These questions can both be answered using the unpaired t-test for
mean equality. I will explain here the method only for question 3, but
it will be very easily applicable to question 4. Please do request
clarification if you have trouble using the following information for
question 4.
Let's call group A to the group of accountants with CPA and group B to
the group of accountants without CPA. Calling mA and mB to the
population mean of the salary of groups A and B rspectively, we're
interested in testing the following hypothesis:
Ho : mA = mB
Ha : mA > mB
Thus we'll thest the hypothesis that both means are equal versus the
hypothesis that the mean of group A is greater than the mean of group
B.
We solve this just the same as before. Given the 0.05 level of
significance, we want to find a value 'a' such that
Prob ( Xa - Xb > a ) = 0.05
So, if the observed value of (Xa - Xb) (which is 61936-49827=12109)
turns out to be greater than 'a', we'll reject the null hypothesis
(means are equal) in favor of the alternative one (mean of CPA
certified accountants is greater).
Dividing in the above equation by sqrt(SDa^2 + SDb^2), where SDa is
the sample std. dev of group A and SDb is the sample std. dev. of
group B, we get:
Prob( (Xa - Xb)/sqrt(SDa^2 + SDb^2) > a/sqrt(SDa^2 + SDb^2) ) = 0.05
Now, if both means were equal, the left hand side of the equation
would be a random variable with a Student's t distribution with
(Na+Nb-2) degrees of freedom , where Na is the sample size of
group A and Nb is the sample size of group B. Since Na+Nb-2=25, then
it's a t distribution with 25 df. So we have
Prob( t(25) > a/sqrt(SDa^2 + SDb^2) ) = 0.05
We use the table again to get that:
Prob( t(25) > 1.725 ) = 0.05
Therefore,
a/sqrt(SDa^2 + SDb^2) = 1.725
a/sqrt(500^2 + 500^2) = 1.725
a = 1219.75
Since the observed difference (12109) is greater than than 'a'
(1219.75) we have evidence to conclude that CPA accountants have
higher salaries.
Google search terms
hypothesis testing
://www.google.com/search?hl=en&q=hypothesis+testing
t distribution table
://www.google.com/search?hl=en&lr=&q=t+distribution+table
unpaired t test
://www.google.com/search?hl=es&q=unpaired+t+test&spell=1
mean equality test
://www.google.com/search?sourceid=navclient&q=mean+equality+test
I hope this helps! If you have any questions regarding my answer,
please don't hesitate to request a clarification. Otherwise I await
your rating and final comments.
Best wishes!
elmarto |
Clarification of Answer by
elmarto-ga
on
28 Feb 2005 15:36 PST
Hello probing!
Question 4 can be answered using the very same reasoning as question
3, just changing the numbers. Calling A the group of Ford cars and B
the group of GM cars, we want to test:
Ho : mA = mB
Ha : mA > mB
(that is, that Ford cars take longer to complete the laps) Given the
0.05 level of significance, we want to find a value 'a' such that
Prob ( Xa - Xb > a ) = 0.05
So, if the observed value of (Xa - Xb) (which is 119.02-118.5=0.52)
turns out to be greater than 'a', we'll reject the null hypothesis
(means are equal) in favor of the alternative one (Fords take longer).
Dividing in the above equation by sqrt(SDa^2 + SDb^2), where SDa is
the sample std. dev of group A and SDb is the sample std. dev. of
group B, we get:
Prob( (Xa - Xb)/sqrt(SDa^2 + SDb^2) > a/sqrt(SDa^2 + SDb^2) ) = 0.05
Now, if both means were equal, the left hand side of the equation
would be a random variable with a Student's t distribution with
(Na+Nb-2) degrees of freedom , where Na is the sample size of group A
and Nb is the sample size of group B. Since Na+Nb-2=20, then it's a t
distribution with 20 df. So we have
Prob( t(20) > a/sqrt(SDa^2 + SDb^2) ) = 0.05
We use the table again to get that:
Prob( t(20) > 1.725 ) = 0.05
Therefore,
a/sqrt(SDa^2 + SDb^2) = 1.725
a/sqrt(1.76^2 + 1.24^2) = 1.725
a = 3.71
Since the observed difference (0.52) is smaller than than 'a'
(3.71), we can't reject the null hypothesis that Ford and GM are
equally fast. So we don't find any evidence that GM cars are faster.
Incidentially, I found that I made a small mistake in question 3,
which fortunately does not change the final conclusion. I wrote in a
line that:
Prob( t(25) > 1.725 ) = 0.05
but this is wrong. I've just re-checked t distribution table, and
instead of 1.725, that number should be 1.708. So the value of 'a'
actually turns out to be 1207.72 instead of 1219.75. Of course, the
difference between the sample means is still greater than this
corrected 'a' value, so we still conclude that CPA certified
accountants earn higher salaries.
Best wishes!
elmarto
|