Google Answers Logo
View Question
 
Q: quants ( Answered,   0 Comments )
Question  
Subject: quants
Category: Business and Money > Economics
Asked by: k9queen-ga
List Price: $30.00
Posted: 18 Nov 2003 17:36 PST
Expires: 18 Dec 2003 17:36 PST
Question ID: 278034
a)What should be made the dependent variable and what variables should
be independent. explain.
b)Run the regression and interpret the parameter estimate. 
c)do you thin there is multicollinearity in this data?
d)How do you try to test for the presence of multicollinearity? explain.
e)explain why the presence of multicollinearity has no negative affect
upon overall prediction but does affect the parameter estimates.
 


District  "Average Class Size"  "Combined SAT Score"   "% Attend4-Year College"
Blue Springs, MO	25	            1083	74
Garden City, NY	        18	             997	77
Indianapolis, IN	30	             716	40
Newport Beach, CA	26	             977	51
Novi, MI	        20	             980	53
Piedmont, CA	        28	            1042	75
Pittsburg, PA	        21	             983	66
Scarsdale, NY	        20	            1110	87
Wayne, PA	         22	            1040	85
Weston, MA	         21	            1031	89
Farmingdale, NY	          22	             947	81
Mamaroneck, NY	          20	            1000	69
Mayfield, OH	          24	            1003	48
Morristown, NJ	          22	             972	64
New Rochellle, NY	  23	            1039	55
Newtown Square, PA	  17	             963	79
Omaha, NE	          23	            1059	81
Shaker Heights, OH	  23	             940	82

Clarification of Question by k9queen-ga on 18 Nov 2003 22:29 PST
I am supposed to clarify these are for studying purposes.
Answer  
Subject: Re: quants
Answered By: hibiscus-ga on 19 Nov 2003 02:46 PST
 
Hi again k9queen, 

The dependent variable of the regression is ATTENDANCE.  We are hoping
to establish a link between class sizes, SAT scores, and the
attendance at colleges.  With this in mind, ATTENDANCE should be made
the dependent variable, with the other variables being used in the
regression.

The results of the regression are as follows:

Dependent variable: ATTENDANCE
 Current sample:  1 to 18
 Number of observations:  18

        Mean of dep. var. = 69.7778      LM het. test = .153765 [.695]
   Std. dev. of dep. var. = 14.8386     Durbin-Watson = 1.32248 [<.099]
 Sum of squared residuals = 2353.88  Jarque-Bera test = .995509 [.608]
    Variance of residuals = 147.118   Ramsey's RESET2 = .366230E-02 [.953]
 Std. error of regression = 12.1292   F (zero slopes) = 9.44297 [.007]
                R-squared = .373102    Schwarz B.I.C. = 72.2923
       Adjusted R-squared = .333921    Log likelihood = -69.4019

              Estimated    Standard
 Variable    Coefficient     Error       t-statistic   P-value
 CLASS_SIZE  -1.06184      .676443       -1.56974      [.136]
 SCORE       .094201       .015422       6.10826       [.000]
 T(16)  Critical Value: 2.119905, Two-tailed area: .05000

The R-squared is 0.37, so 37% of the variation is described by the
variables in the model.

The CLASS_SIZE coefficient is -1.06, so increasing class sizes reduce
the attendance rate at colleges.  The coefficient on SCORE is 0.094,
so higher SAT scores improve college attendance.

Describing multicollinearity is something that I don't feel I could do
nearly as eloquently as the description I found at graphpad.com
http://www.graphpad.com/articles/Multicollinearity.htm

"In some cases, multiple regression results may seem paradoxical. Even
though the overall P value is very low, all of the individual P values
are high. This means that the model fits the data well, even though
none of the X variables has a statistically significant impact on
predicting Y. How is this possible? When two X variables are highly
correlated, they both convey essentially the same information. In this
case, neither may contribute significantly to the model after the
other one is included. But together they contribute a lot. If you
removed both variables from the model, the fit would be much worse. So
the overall model fits the data well, but neither X variable makes a
significant contribution when it is added to your model last. When
this happens, the X variables are collinear and the results show
multicollinearity."

It seems unlikely that there is multicollinearity in this data.  There
are a number of methods for testing for multicollinearity described at
this site http://www.xycoon.com/detection.htm .  We can stick to a
simple one.  If the t-statistic for both variables, CLASS_SIZE and
SCORE were not statistically significant, while the predictive value
of the two together was high it would be likely that there was
multicollinearity.  However, in this case the computed critical value
of the t-statistic is about 2.12, so the t-statistic of SCORE is
significant, while that of CLASS_SIZE is not.  Or, alternatively, if
the P-values for both variables were high we might suspect
multicollinearity, but in this case the P-value for SCORE is 0.

For the last part of the question, I refer you to another article
lifted from the graphpad site here
http://www.graphpad.com/instatman/Ismulticollinearityaproblem_.htm

"The term multicollinearity is as hard to understand as it is to say.
But it is important to understand, as multicollinearity can interfere
with proper interpretation of multiple regression results. To
understand multicollinearity, first consider an absurd example.
Imagine that you are running multiple regression to predict blood
pressure from age and weight. Now imagine that you've entered
weight-in-pounds and weight-in-kilograms as two separate X variables.
The two X variables measure exactly the same thing - the only
difference is that the two variables have different units. The P value
for the overall fit is likely to be low, telling you that blood
pressure is linearly related to age and weight. Then you'd look at the
individual P values. The P value for weight-in-pounds would be very
high - after including the other variables in the equation, this one
adds no new information. Since the equation has already taken into
account the effect of weight-in-kilograms on blood pressure, adding
the variable weight-in-pounds to the equation adds nothing. But the P
value for weight-in-kilograms would also be high for the same reason.
After you include weight-in-pounds to the model, the goodness-of-fit
is not improved by including the variable weight-in-kilograms. When
you see these results, you might mistakenly conclude that weight does
not influence blood pressure at all since both weight variables have
very high P values. The problem is that the P values only assess the
incremental effect of each variable. In this example, neither variable
has any incremental effect on the model. The two variables are
collinear.

That example is a bit absurd, since the two variables are identical
except for units. The blood pressure example -- model blood pressure
as a function of age, weight and gender - is more typical. It is hard
to separate the effects of age and weight, if the older subjects tend
to weigh more than the younger subjects. It is hard to separate the
effects of weight and gender if the men weigh more than the women.
Since the X variables are intertwined, multicollinearity will make it
difficult to interpret the multiple regression results."

I think that pretty much sums up why overall prediction will remain
okay when multicollinearity is present, even though the parameter
estimates may be distoreted.

This has been a bit of long answer, and I hope it is clear.  As
always, if there are any troubles, please ask for clarification.

Hibiscus
Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy