Google Answers: statistics

View Question

Q: statistics ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: statistics
Category: Business and Money > Economics
Asked by: makbool-ga
List Price: $30.00

Posted: 14 Oct 2003 20:07 PDT
Expires: 13 Nov 2003 19:07 PST
Question ID: 266363

Can I get references and step by step guide to follow to solve these questions In a random sample of 400 people 80 are females. What proportion of people are females with a 99% confidence level In a random sample of 166 individuals height, the average height is 1388. SD is $400. With 99% confidence level what would be the population mean? For a film company to make profit on its launch it will require 10% to breakeven and more than 10% audience to make profit. At random 400 people who saw the movie were asked of their opinion. 54 said they thought the movie was brilliant. With a level of significance 0.5 is it wise for the film company to make decision that the movie will make profit. Mean = 387 and SD = 60 and population mean < 400. Can this be concluded at the 5% significance level? regards makbool
Clarification of Question by makbool-ga on 14 Oct 2003 21:38 PDT OPPS A MISTAKE second question should read SD = 400 not $400 makbool
Request for Question Clarification by elmarto-ga on 15 Oct 2003 11:01 PDT Hi makbool! In your last question, you state: Mean = 387 and SD = 60 and population mean < 400. Can this be concluded at the 5% significance level? I assume you want to test the null hypothesis that the population mean is smaller than 400, given that the sample mean is 387. Could you please clarify the following? 1) The SD you provide is the sample SD or the population SD? 2) How many observations are in the sample? Best regards, elmarto
Clarification of Question by makbool-ga on 15 Oct 2003 15:36 PDT Hi elmarto Yes it is to test the null hypothesis and the population mean is less than 400, sample mean is 387 1) The SD you provide is the sample SD or the population SD? it is population mean 2) How many observations are in the sample? 20 cheers makbool

Answer

Subject: Re: statistics
Answered By: elmarto-ga on 16 Oct 2003 09:14 PDT
Rated: 5 out of 5 stars

Hi again makbool!
These are the answers to each of your questions. I will answer them in
a different order, since it's easier to understand the method by first
analyzing the second question.


In a random sample of 166 individuals height, the average height is
1388.  SD is 400.  With 99% confidence level what would be the
population mean?

In this case, I assume you want to build a 99% confidence interval
around the sample mean of 1388. That is, there will be a 99% chance
that the population mean will be within this interval. I will also
assume that the SD you give is the sample SD. Anyway, the method with
the population SD will be given for the last question.

In order to answer this and the other questions, we have to use the
Central Limit Theorem:

"The central limit theorem states that given a distribution with a
mean m and variance s2, the sampling distribution of the mean
approaches a normal distribution with a mean (m) and a variance s2/N
as N, the sample size, increases."
 
Central Limit Theorem
http://davidmlane.com/hyperstat/A14043.html

Here the sample size of 166 can be considered "large". Therefore, we
can say that the sample mean comes from an approximately normal
distribution.

Let's call the sample mean X, the population mean m, and the sample
variance s2 (so the sample standard deviation is s). Let's also call n
the sample size. In order to build a 99% confidence interval around
the sample mean, we need to find a number 'a' such that:

Prob(m < X-a) = 0.005
and
Prob(m > X+a) = 0.005

The intuition behind this is that we want to find 'a' such that the
probability that m is outside the interval [X-a,X+a] is 0.01. Now,
since X follows a normal distribution (approximately), which is
symmetric around its mean, it turns out that you can use either
equation to calculate 'a' and you will get the same result. Let's use
the first one and rearrange it a little bit:

 Prob(m < X-a)
=Prob(X-m>a)
=Prob( (X-m)/(s/sqrt(n)) > a/(s/sqrt(n)) )

where sqrt means "square root". Why write it like this? Because now
the left-hand side of the inequality is known to follow a
t-distribution, for which have probability tables. Notice that its
just X minus its mean (m) divided by its standard deviation (recall
from the Central Limit Theorem that the sample mean has variance s2/n)

Student's t distribution
http://mathworld.wolfram.com/Studentst-Distribution.html

In particular, it follows a t distribution with n-1 (165) degrees of
freedom. Using that sqrt(166)=12.88, we have to solve the equation:

 Prob( t(165) > a/(s/sqrt(n)) )
=Prob( t(165) > a/(400/12.88 ) = 0.005

Looking up in a t distribution table

T-Distribution table
http://www.stat.ucla.edu/~dinov/courses_students.dir/Applets.dir/T-table.html

we find that

Prob( t(165) > 2.576 ) = 0.005

I found this value in table given above using the fact that df
(degrees of freedom) is greater than the maximum shown value (120) so
it's assumed to be infinity; and looking in the column which reads
"0.005". Now all we have to do is solve:

a/(400/12.88) = 2.576

which gives, after straightforward algebra, a=80 (approx.). Therefore,
a 99% confidence interval for the population mean is the interval:

 [1388-80 , 1388+80]
=[1308 , 1468]


- In a random sample of 400 people 80 are females.  What proportion of
people are females with a 99% confidence level

This is very similar to the previous question, but with a small twist.
The experiment here is binomial: the individual will be either male or
female, with probability given by the population proportion. Let's
call X the number of females in the sample, p the proportion of
females and n the number of experiments (which is the sample size). We
know then that X follows a binomial distribution, with parameteres
(n,p), and that it has mean equal to n*p and variance equal to
n*p*(1-p). Also, since the number of experiments is large, we can use
the normal approximation to the binomial: we can say that X follows an
approximately normal distribution, with mean n*p and variance
n*p*(1-p).

Now, the sample proportion of females is then X/n. Since X follows a
(approximately) normal distribution, then X/n does also, with mean
(n*p)/n (=p), and variance n*p*(1-p)/(n^2) (=p*(1-p)/n). Knowing this,
we can proceed exactly as we  did for the previous question. Let's
call P to X/n (don't confuse with the small p, the latter is the
population proportion, while P is the observed sample proportion). We
want to find 'a' such that:

Prob( p<P-a ) = 0.005
and
Prob( p>P+a ) = 0.005

Rewriting as before:

 Prob( p < P-a )
=Prob( P-p > a)
=Prob( (P-p)/sqrt(p*(1-p)/n) > a/sqrt(p*(1-p)/n) )

And again, we're taking a normally distributed variable (P),
substracting its mean (p) and dividing it by its standard deviation.
Notice that since we don't know p we can't compute sqrt(p*(1-p)/n);
therefore, we approximate it with  sqrt(P*(1-P)/n) (we know P already:
it's 80/400=0.2). Since we're using the observed (sample) P, again we
will get the t distribution with n-1 degrees of freedom:

 Prob( t(399) > a/sqrt(P*(1-P)/n) )
=Prob( t(399) > a/sqrt(0.2*(1-0.2)/400) )
=Prob( t(399) > a/0.02 ) = 0.005

Again looking up the table, we find that

a/0.02 = 2.576
a = 0.051

Therefore, a 99% confidence interval for the population proportion is

 [0.2-0.051 , 0.2+0.051]
=[0.149 , 0.251]


- For a film company to make profit on its launch it will require 10%
to
breakeven and more than 10% audience to make profit.  At random 400
people who saw the movie were asked of their opinion.  54 said they
thought the movie was brilliant.  With a level of significance 0.5 is
it wise for the film company to make decision that the movie will make
profit.

This one is very similar as the previous one, with a small change.
Here we're testing the hypotheses:

Ho : p=0.1
Ha : p>=0.1

I don't have any t table which has the 0.5 value for significance;
therefore I will change this to 0.95 significance level. You can then
apply the method for another level.

In this case, we have to find 'a' such that:

Prob( P-p > a ) = 0.05   (the 0.05 comes from 1-0.95)

The idea here is that we want to find a limit value 'a', in order to
compare it then with the actuall difference between the observed P and
the hypothesized p. For example, P is 54/400=0.135. The null
hypothesis is p=0.1. We thus see that P-p=0.035. Is this difference
"large enough" to conclude that the real p can't be 0.1? That's what
we want to find with 'a'. This value will be such that, assuming
p=0.1, the probability of observing a difference between P and p
greater than 'a' will be unlikely (0.05 probability); which would make
us conclude, if the actual difference is greater than 'a', that the
initial hypothesis (p=0.1) is wrong, so this hypothesis will be
rejected.

The method to find a is exactly the same as before:

 Prob( (P-p) > a )
=Prob( (P-p)/sqrt(p*(1-p)/n) > a/sqrt(P*(1-P)/n) )

=Prob( t(399) > a/sqrt(0.135*0.865/400) = 0.05

Looking up the table in the same fashion as before, we find that

a/sqrt(0.135*0.865/400) = 1.645

Solving for a, gives a=0.028

Now, the actual P-p is 0.135-0.1 = 0.035. Since this number is greater
than 'a', we reject the null hypothesis that p=0.1 in favor of the
alternative hypothesis that p>0.1. That is, we have evidence that more
than 10% of the people find the movie brilliant, so it would be wise
for the film company to produce it.


- Mean = 387 and SD = 60 and population mean < 400.  Can this be
concluded at the 5% significance level?

In this case, the null and alternative hypothesis can be stated as:

Ho : m=400
Ha : m<400

In this case the sample size is not that large, but we can still use
the Central Limit Theorem. With less than 15-20 observations, the
normal approximation becomes worse.

Just as before, let's call X the sample mean, m the population mean, n
the sample size (20) and s the SD. As in the previous question, we
have to find an 'a' such that:

Prob( X-m < a ) = 0.05

In this case, we will reject the null hypothesis whenever the observed
X-m is LESS than 'a'.

Knowing the population SD makes things a little bit easier. It won't
be necessary to use the t-distribution now. We rewrite the above
equation as:

Prob( (X-m)/(s/sqrt(n)) < a/(s/sqrt(n)) ) = 0.05

Now, since the population SD was given, the lefthand side of the
inequality no longer follows a t distribution. In this case, you're
taking a (approx.) normally distributed variable, X; substracting its
mean m, and dividing it by its actual SD (s/sqrt(n)). This gives the
standard normal distribution, which is a normal distribution with mean
0 and SD 1.

Looking up a normal distribution table

Normal distribution table
http://www.math.jhu.edu/~js/Math107/NormTable.htm

gives

a/(s/sqrt(n)) = -1.64

I found this number by looking up approximately 0.95 inside the table
and then seeing which number corresponds to it. Although there is no
0.95, there is a 0.9495 and a 0.9505 (one is 1.64 and the other is
1.65). This means that P(X<1.64)=0.95. Using the symmetry property of
the standard normal distribution, we get that P(X<-1.64)=1-0.95=0.05.

Now, solving for 'a' gives:

a/(60/sqrt(20)) = -1.64
a = -22

Now, the actual X-m is 387-400 = -13, which is not less than -22. So
it's not "unlikely" to observe 387 when the population mean is 400.
Therefore, we can't reject the null hypothesis that the population
mean is 400. We don't have evidence at this level of significance that
the population mean is less than 400.


In order to learn more about hypothesis testing, you might want to
visit the following link:

Steps in Hypothesis Testing
http://davidmlane.com/hyperstat/logic_hypothesis.html


Google search strategy
hypothesis testing
://www.google.com.ar/search?q=hypothesis+testing&ie=UTF-8&oe=UTF-8&hl=es&meta=
normal distribution table
://www.google.com.ar/search?hl=es&ie=UTF-8&oe=UTF-8&q=normal+distribution+table&meta=
t distribution table
://www.google.com.ar/search?hl=es&lr=&ie=UTF-8&oe=UTF-8&q=t+distribution+table&spell=1
binomial distribution
://www.google.com.ar/search?hl=es&ie=UTF-8&oe=UTF-8&q=binomial+distribution&btnG=B%C3%BAsqueda+en+Google&meta=


I hope this helps! If you have any questions regarding my answer,
please don't hesitate to request a clarification. Otherwise I await
your rating and final comments.

Best wishes!
elmarto

Request for Answer Clarification by makbool-ga on 17 Oct 2003 04:45 PDT
Hi elmarto

can you pls verify how did you arrive at a = 0.051 as stated below
(Question 1 answered as ques 2)
 
a/0.02 = 2.576 
a = 0.051 

regards
makbool

Clarification of Answer by elmarto-ga on 17 Oct 2003 06:18 PDT

Hi makbool!
Sure, the way to do it is just to multiplicate both sides of the equation by 0.02:

a/0.02 = 2.576
a = 2.576*0.02
a = 0.051

Please let me know if you have any other difficulties with my answer.

Best wishes!
elmarto

makbool-ga rated this answer: 5 out of 5 stars

Thank you in helping me to understand the concept
makbool

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy