Google Answers Logo
View Question
 
Q: Multiple Linear Regression "r" values & forecasting assumptions ( Answered,   0 Comments )
Question  
Subject: Multiple Linear Regression "r" values & forecasting assumptions
Category: Business and Money > Finance
Asked by: keithp-ga
List Price: $5.00
Posted: 15 Aug 2003 14:41 PDT
Expires: 14 Sep 2003 14:41 PDT
Question ID: 245201
When calculating a linear regression...

-what is the difference between the "r squared" value and "multiple r"
value?

-If the "r" value is .30, and accounts for 30% of the variation in
data, is it safe to assume that the regression model divided by "r"
yields 100%? In other words, when using regression to forecast the
value of a widget, if the model only accurately accounts for 30% of
the widget's value, would dividing the resulting "y" value by the "r"
value result in a full accounting of the value of teh widget. To put
it another way, assume y=1000 and r=.30. Is the value of the widget
1000 or can I accurately say 1000/.3 = 3333 as the actual value?
Answer  
Subject: Re: Multiple Linear Regression "r" values & forecasting assumptions
Answered By: maniac-ga on 15 Aug 2003 18:47 PDT
 
Hello Keithp,

I am assuming you are using a tool such as Excel (Tools - Data
Analysis - Regression) or SPSS to generate these factors. I will try
to keep the answer in terms you can understand based on that
assumption. If that is incorrect (or the answer is otherwise not
understandable), please let me know in a clarification request so I
can correct the answer.

Multiple R (or the correlation coefficient) is an indication of the
relationship between two variables. In simple terms,
  R greater than zero and close to one
means that the two variables are closely related. An increase in X
will result in a corresponding increase in Y.
  R less than zero and close to minus one
means that the two variables are closely (but inversely) related. An
increase in X will result in a corresponding decrease in Y.
  R near zero
means that the two variables are not related at all. A change in X
will not necessarily result in a change in Y.

R squared is a goodness of fit for a formula (usually a straight line)
to the data. In many ways it is similar to the correlation
coefficient. However since R squared is always a range of zero to one,
you don't get an indication if the relationship is direct (R > 0) or
inverse (R < 0). Another factor you may see is "Adjusted R Squared"
which takes into account the number of data points. If you have a
subset of the total data, the accuracy of the result will generally be
less.

To answer your second question, let me provide an example. Let's say
there is a company making a product where each unit takes 3 units of A
and 7 units of B to produce. The cost to produce the product based on
the cost of A and B is something like...

 A  B  Cost
 1  1   10
 2  1   13
 3  2   23
 1  2   17
and so on. As you can see, the changes in the cost of A only
contributes to 0.3 of the change in cost of the product. If you had a
number of such values, the R value should be 0.3 for A and 0.7 for B.

So, using this table, you get predictions such as
  1/.3 = 3.33...
  2/.3 = 6.66...
  3/.3 = 10
and so on. So the formula you included at the end of your question is
incorrect.

There is a technique (multiple linear regression) which is more
appropriate to use with this kind of example.

For more information, I suggest a few sites including:

  http://helios.bto.ed.ac.uk/bto/statistics/tress11.html
  A pretty good set of examples and includes descriptions of using the
tools in Excel.

  http://www.uvm.edu/~dhowell/gradstat/psych340/Lectures/CorrelReg/correl1.html
  Another site with good examples, perhaps a little more technical. I
am including it because it talks about the difference between
correlation and regression.

  http://www-micro.msb.le.ac.uk/2060/2060-4.html
  Another good explanation of how correlation and regression are
different. Good illustrations as well.

  http://mathworld.wolfram.com/CorrelationCoefficient.html
  A number of equations, but the charts showing different R squared
values are helpful.

  http://www.csuchico.edu/psy/Examples.htm
  Examples using SPSS (data and output provided).

  http://bmj.com/statsbk/11.shtml
  More examples, this time showing +1 / -1 correlation, non-linear
regression, defines a number of other statistical measures

Helpful search phrases included:
  compare correlation regression
  define "r squared" "multiple r"
  define "multiple r" correlation

  --Maniac

Request for Answer Clarification by keithp-ga on 15 Aug 2003 23:02 PDT
I am using excel, except with analyse-it.com plugins.

Analyse-it returns an adjusted r squared value for my multiple linear
regression.

Generally, your explanation is a little more technical than what I am
looking for, but does answer my question I think.

Specifically, my multiple regression model uses about 6 variables and
yields an adjusted r of .30. Would I be right in saying that the
resulting Y in my model is just a weak model? What about the remaining
.70 of variation in the model? Is there a way to project or compensate
for the remaining unknown variation/value of .70 without
experimentation and adding more x variables?

thanks.

Clarification of Answer by maniac-ga on 16 Aug 2003 05:24 PDT
Hello Keithp,

Based on what you described, a six variable model with an Adjusted R
of 0.3, the model is weak. Selecting the right variables and
relationships can be done on an iterative basis. There are a few
interesting reports on line including:

  http://149.43.80.142/cs100homeS01/EPearson/Project/drunk%20driving%20factors.html
  A project on Drunk Driving looking at four factors that contribute
to drunk driving deaths. It steps through the analysis of each
variable and then the combination to come to the conclusions
presented.

  http://www.eeb.uconn.edu/Courses/EEB200/eeb200s01/stobutzki.htm
  A study of fish / swimming speeds. It talks about the poor results
achieved and suggests further tests to correct the model. For example,
some of the fish may have been swimming a great [or less] distance
prior to capture, resulting in fatigue and poor performance.

Another aspect that may be considered is that the relationship between
two variables is not linear. It is often helpful to chart the result
against each independent variable. For example if Y is your result and
X1 through X6 are the independent variables, make six plots similar to
those in
  http://bmj.com/statsbk/11.shtml
to see if the relationship is not linear. There are a number of
relationships in the real world that are not linear including:
 - weight vs. height
 - aerodynamic drag vs. speed [can be extremely non-linear]
If you see one of these, there are techniques to perform regression
with non-linear formulas.

Another method that may be helpful is to select subsets of data and
determine the correlation of these subsets. If a chart shows bands of
data (the model I described would do that), then you can compute the
coefficients for each subset. The problem is now to determine what
factor (indepdent variable) is causing the differences. Additional
experiments may help, but you may also have the data you need if you
can determine the other factor (and assign values to each data point)
to do further analysis.

  --Maniac
Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy