Hi again makbool!
These are the answers to each of your questions. I will answer them in
a different order, since it's easier to understand the method by first
analyzing the second question.
In a random sample of 166 individuals height, the average height is
1388. SD is 400. With 99% confidence level what would be the
population mean?
In this case, I assume you want to build a 99% confidence interval
around the sample mean of 1388. That is, there will be a 99% chance
that the population mean will be within this interval. I will also
assume that the SD you give is the sample SD. Anyway, the method with
the population SD will be given for the last question.
In order to answer this and the other questions, we have to use the
Central Limit Theorem:
"The central limit theorem states that given a distribution with a
mean m and variance s2, the sampling distribution of the mean
approaches a normal distribution with a mean (m) and a variance s2/N
as N, the sample size, increases."
Central Limit Theorem
http://davidmlane.com/hyperstat/A14043.html
Here the sample size of 166 can be considered "large". Therefore, we
can say that the sample mean comes from an approximately normal
distribution.
Let's call the sample mean X, the population mean m, and the sample
variance s2 (so the sample standard deviation is s). Let's also call n
the sample size. In order to build a 99% confidence interval around
the sample mean, we need to find a number 'a' such that:
Prob(m < X-a) = 0.005
and
Prob(m > X+a) = 0.005
The intuition behind this is that we want to find 'a' such that the
probability that m is outside the interval [X-a,X+a] is 0.01. Now,
since X follows a normal distribution (approximately), which is
symmetric around its mean, it turns out that you can use either
equation to calculate 'a' and you will get the same result. Let's use
the first one and rearrange it a little bit:
Prob(m < X-a)
=Prob(X-m>a)
=Prob( (X-m)/(s/sqrt(n)) > a/(s/sqrt(n)) )
where sqrt means "square root". Why write it like this? Because now
the left-hand side of the inequality is known to follow a
t-distribution, for which have probability tables. Notice that its
just X minus its mean (m) divided by its standard deviation (recall
from the Central Limit Theorem that the sample mean has variance s2/n)
Student's t distribution
http://mathworld.wolfram.com/Studentst-Distribution.html
In particular, it follows a t distribution with n-1 (165) degrees of
freedom. Using that sqrt(166)=12.88, we have to solve the equation:
Prob( t(165) > a/(s/sqrt(n)) )
=Prob( t(165) > a/(400/12.88 ) = 0.005
Looking up in a t distribution table
T-Distribution table
http://www.stat.ucla.edu/~dinov/courses_students.dir/Applets.dir/T-table.html
we find that
Prob( t(165) > 2.576 ) = 0.005
I found this value in table given above using the fact that df
(degrees of freedom) is greater than the maximum shown value (120) so
it's assumed to be infinity; and looking in the column which reads
"0.005". Now all we have to do is solve:
a/(400/12.88) = 2.576
which gives, after straightforward algebra, a=80 (approx.). Therefore,
a 99% confidence interval for the population mean is the interval:
[1388-80 , 1388+80]
=[1308 , 1468]
- In a random sample of 400 people 80 are females. What proportion of
people are females with a 99% confidence level
This is very similar to the previous question, but with a small twist.
The experiment here is binomial: the individual will be either male or
female, with probability given by the population proportion. Let's
call X the number of females in the sample, p the proportion of
females and n the number of experiments (which is the sample size). We
know then that X follows a binomial distribution, with parameteres
(n,p), and that it has mean equal to n*p and variance equal to
n*p*(1-p). Also, since the number of experiments is large, we can use
the normal approximation to the binomial: we can say that X follows an
approximately normal distribution, with mean n*p and variance
n*p*(1-p).
Now, the sample proportion of females is then X/n. Since X follows a
(approximately) normal distribution, then X/n does also, with mean
(n*p)/n (=p), and variance n*p*(1-p)/(n^2) (=p*(1-p)/n). Knowing this,
we can proceed exactly as we did for the previous question. Let's
call P to X/n (don't confuse with the small p, the latter is the
population proportion, while P is the observed sample proportion). We
want to find 'a' such that:
Prob( p<P-a ) = 0.005
and
Prob( p>P+a ) = 0.005
Rewriting as before:
Prob( p < P-a )
=Prob( P-p > a)
=Prob( (P-p)/sqrt(p*(1-p)/n) > a/sqrt(p*(1-p)/n) )
And again, we're taking a normally distributed variable (P),
substracting its mean (p) and dividing it by its standard deviation.
Notice that since we don't know p we can't compute sqrt(p*(1-p)/n);
therefore, we approximate it with sqrt(P*(1-P)/n) (we know P already:
it's 80/400=0.2). Since we're using the observed (sample) P, again we
will get the t distribution with n-1 degrees of freedom:
Prob( t(399) > a/sqrt(P*(1-P)/n) )
=Prob( t(399) > a/sqrt(0.2*(1-0.2)/400) )
=Prob( t(399) > a/0.02 ) = 0.005
Again looking up the table, we find that
a/0.02 = 2.576
a = 0.051
Therefore, a 99% confidence interval for the population proportion is
[0.2-0.051 , 0.2+0.051]
=[0.149 , 0.251]
- For a film company to make profit on its launch it will require 10%
to
breakeven and more than 10% audience to make profit. At random 400
people who saw the movie were asked of their opinion. 54 said they
thought the movie was brilliant. With a level of significance 0.5 is
it wise for the film company to make decision that the movie will make
profit.
This one is very similar as the previous one, with a small change.
Here we're testing the hypotheses:
Ho : p=0.1
Ha : p>=0.1
I don't have any t table which has the 0.5 value for significance;
therefore I will change this to 0.95 significance level. You can then
apply the method for another level.
In this case, we have to find 'a' such that:
Prob( P-p > a ) = 0.05 (the 0.05 comes from 1-0.95)
The idea here is that we want to find a limit value 'a', in order to
compare it then with the actuall difference between the observed P and
the hypothesized p. For example, P is 54/400=0.135. The null
hypothesis is p=0.1. We thus see that P-p=0.035. Is this difference
"large enough" to conclude that the real p can't be 0.1? That's what
we want to find with 'a'. This value will be such that, assuming
p=0.1, the probability of observing a difference between P and p
greater than 'a' will be unlikely (0.05 probability); which would make
us conclude, if the actual difference is greater than 'a', that the
initial hypothesis (p=0.1) is wrong, so this hypothesis will be
rejected.
The method to find a is exactly the same as before:
Prob( (P-p) > a )
=Prob( (P-p)/sqrt(p*(1-p)/n) > a/sqrt(P*(1-P)/n) )
=Prob( t(399) > a/sqrt(0.135*0.865/400) = 0.05
Looking up the table in the same fashion as before, we find that
a/sqrt(0.135*0.865/400) = 1.645
Solving for a, gives a=0.028
Now, the actual P-p is 0.135-0.1 = 0.035. Since this number is greater
than 'a', we reject the null hypothesis that p=0.1 in favor of the
alternative hypothesis that p>0.1. That is, we have evidence that more
than 10% of the people find the movie brilliant, so it would be wise
for the film company to produce it.
- Mean = 387 and SD = 60 and population mean < 400. Can this be
concluded at the 5% significance level?
In this case, the null and alternative hypothesis can be stated as:
Ho : m=400
Ha : m<400
In this case the sample size is not that large, but we can still use
the Central Limit Theorem. With less than 15-20 observations, the
normal approximation becomes worse.
Just as before, let's call X the sample mean, m the population mean, n
the sample size (20) and s the SD. As in the previous question, we
have to find an 'a' such that:
Prob( X-m < a ) = 0.05
In this case, we will reject the null hypothesis whenever the observed
X-m is LESS than 'a'.
Knowing the population SD makes things a little bit easier. It won't
be necessary to use the t-distribution now. We rewrite the above
equation as:
Prob( (X-m)/(s/sqrt(n)) < a/(s/sqrt(n)) ) = 0.05
Now, since the population SD was given, the lefthand side of the
inequality no longer follows a t distribution. In this case, you're
taking a (approx.) normally distributed variable, X; substracting its
mean m, and dividing it by its actual SD (s/sqrt(n)). This gives the
standard normal distribution, which is a normal distribution with mean
0 and SD 1.
Looking up a normal distribution table
Normal distribution table
http://www.math.jhu.edu/~js/Math107/NormTable.htm
gives
a/(s/sqrt(n)) = -1.64
I found this number by looking up approximately 0.95 inside the table
and then seeing which number corresponds to it. Although there is no
0.95, there is a 0.9495 and a 0.9505 (one is 1.64 and the other is
1.65). This means that P(X<1.64)=0.95. Using the symmetry property of
the standard normal distribution, we get that P(X<-1.64)=1-0.95=0.05.
Now, solving for 'a' gives:
a/(60/sqrt(20)) = -1.64
a = -22
Now, the actual X-m is 387-400 = -13, which is not less than -22. So
it's not "unlikely" to observe 387 when the population mean is 400.
Therefore, we can't reject the null hypothesis that the population
mean is 400. We don't have evidence at this level of significance that
the population mean is less than 400.
In order to learn more about hypothesis testing, you might want to
visit the following link:
Steps in Hypothesis Testing
http://davidmlane.com/hyperstat/logic_hypothesis.html
Google search strategy
hypothesis testing
://www.google.com.ar/search?q=hypothesis+testing&ie=UTF-8&oe=UTF-8&hl=es&meta=
normal distribution table
://www.google.com.ar/search?hl=es&ie=UTF-8&oe=UTF-8&q=normal+distribution+table&meta=
t distribution table
://www.google.com.ar/search?hl=es&lr=&ie=UTF-8&oe=UTF-8&q=t+distribution+table&spell=1
binomial distribution
://www.google.com.ar/search?hl=es&ie=UTF-8&oe=UTF-8&q=binomial+distribution&btnG=B%C3%BAsqueda+en+Google&meta=
I hope this helps! If you have any questions regarding my answer,
please don't hesitate to request a clarification. Otherwise I await
your rating and final comments.
Best wishes!
elmarto |