Google Answers Logo
View Question
 
Q: quants (8) ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: quants (8)
Category: Business and Money > Economics
Asked by: k9queen-ga
List Price: $30.00
Posted: 12 Nov 2003 21:23 PST
Expires: 12 Dec 2003 21:23 PST
Question ID: 275333
Assume you have a scatter of points.  You fit a line through the
points.  it is a regression line and the equation for it looks the
example given in graph 1

estimated Y=$265,725 + 27,500t (where "t" is the number of quarters,
with the first quarter in our data set having a value of one and the
last quarter having a value of 24).  Assume the the R-squared value is
equal to .87

 Y $ |          
     |          *
     |        *
     |  * * *
     |  * *
     |*_______________
                      t




Graph 2
a)Explain how the regression line is chosen (do this nontechnically).
Will the parameter generated by a regression equation that takes all
the points on the chart into account be greater than or less than or
equal to the parameter generated if point G is dropped from our data
set.  Explain.

b) Interpret the parameter on the trend variable.

c)What is R-squared attempting to measure?  Explain in a nontechnical way. 


  Y |
  $ |        *
    |     *   *
    |   *  *     G
    |*   *      /
    |_*______ /________
                       t




Graph 3
a)what is the value of R-squared likely to equal to? explain.


  Y |
  $ |
    |**********
    |**********
    |______________
                  t

Clarification of Question by k9queen-ga on 13 Nov 2003 11:52 PST
Is this question being actively worked on?
Its been in this mode for over 5 hours.
Answer  
Subject: Re: quants (8)
Answered By: elmarto-ga on 13 Nov 2003 12:03 PST
Rated:5 out of 5 stars
 
Hi k9queen!
Here are the answers to your questions.

a) The idea of this regression procedure is to find the line that best
approximates the actual values. That is, find a line such that for
each value of t (i.e., at each period) the difference between the
value of the line and the actual value is as small as possible. Of
course, there are many data points that need to be fit, so the
regression line is such that the *sum* of these errors is as small as
possible. Furthermore, each "error" is usually defined to be the
square of the difference between the line value and the actual value.
The reason for the square is twofold. First, to make the errors always
positive. For example, if you defined the error to be

(line value - actual value)

then you could have that at data point 1 the error is 10 (line value
is greater than actual value), and at data point 2 the error is -10.
Well, when summing these errors you would get 0, implying that the
line perfectly fits the data. Of course, this is wrong: there are in
fact errors of 10. So the square would make the -10 positive,
eliminating this problem. The second reason for the square is to
exaggerate the importance of large errors. In this way, when
minimizing the square errors in order to find the regression line, we
are trying to avoid large differences between the line and the actual
data.

Check the following links for more information on this subject and a
deeper insight on the following question.

Linear Relationships
http://illuminations.nctm.org/imath/912/LinearRelationships/

http://standards.nctm.org/document/eexamples/chap7/7.4/

Regarding the inclusion of point G in the regression of graph 2, it's
clear that the time parameter (the one that multiplies t) will be
substantially lower if include G than if we don't. Recall that the
regression line tries to minimize the sum of errors between the line
and the actual data. If you draw the line that best fits the points,
ignoring point G, you'll notice that there will be a large difference
between G and the line. Thus the line will have to become flatter in
order to lower the error that comes from the difference of point G
with the line. Mathematically, the line becoming flatter means that
the slope of the line (the parameter that multiplies t) becomes closer
to zero.

You might want to try the applet from the first link above to check
the line that best fits the point without G, and what happens when you
add it. Notice that the inclusion of point G will move the line in a
way that the error for all the other values becomes larger, so that
the fit of the regression line becomes worse. Data points like G are
called "outliers": values that are too "different" from the rest of
the data.

b) The parameter that multiplies t is the slope of the regression
line. Mathematically, the slope shows by how much Y changes when X
changes. In this case, X represents time. Therefore, this parameter
shows how does Y (the dependent variable) change as time passes. If
the parameter is positive, it means that Y is increasing through time;
if it's negative it means that it's decreasing; if it's close to zero,
it means that Y is not changing much through time.

c) The R-squared is a measure of the "goodness of the fit"; that is,
how well the regression line fits the actual data. The formula for it
(which is related to the ratio between the variance of the data and
the variance of the errors) implies that it is actually measuring how
much of the data variance is being explained by the regression line.

Check the following link for more information on R-squared and
trendlines in general.

Introduction to Mathematical Models
http://qrc.depaul.edu/jcasey/Thursday704/Class4Notes.htm


Graph 3
a) In this case, it's likely that the R-squared will be equal or near
to zero. I assume here that the data points are actually something
like this (please request a clarification if I'm wrong in this
assumption):

|
|
|
| * * * * * *
|
|* * * * * *
|
|
---------------------
                    t

(I'm making this correction because according to the graph in the
question, it would appear that each time t has two different values
for Y -which is impossible). It's clear that the trend line will be
something like this:

|
|
|
| * * * * * *
|---------------
|* * * * * *
|
|
---------------------
                    t

However, how much of the data variance is being explained by this
line? Nothing! The line actually predicts that the values of Y never
change, therefore it has zero variance. However, we can also see that
the actual values of Y do have positive variance (Y is not constant).
Therefore, the variance that is explained by this best fit trend line
will be zero. This implies that the value of the R-square of this
regression line is also 0.


Google search strategy
R-squared "regression line"
://www.google.com.ar/search?q=R-squared+%22regression+line%22&ie=UTF-8&oe=UTF-8&hl=es&meta=

"regression line" outliers
://www.google.com.ar/search?hl=es&ie=UTF-8&oe=UTF-8&q=%22regression+line%22+outliers&btnG=B%C3%BAsqueda+en+Google&meta=


I hope this helps! If you have any doubt regarding my answer, please
don't hesitate to request a clarification.
 
Best wishes! 
elmarto
k9queen-ga rated this answer:5 out of 5 stars and gave an additional tip of: $5.00
always able to understand the process

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy