Hello Keithp,
I am assuming you are using a tool such as Excel (Tools - Data
Analysis - Regression) or SPSS to generate these factors. I will try
to keep the answer in terms you can understand based on that
assumption. If that is incorrect (or the answer is otherwise not
understandable), please let me know in a clarification request so I
can correct the answer.
Multiple R (or the correlation coefficient) is an indication of the
relationship between two variables. In simple terms,
R greater than zero and close to one
means that the two variables are closely related. An increase in X
will result in a corresponding increase in Y.
R less than zero and close to minus one
means that the two variables are closely (but inversely) related. An
increase in X will result in a corresponding decrease in Y.
R near zero
means that the two variables are not related at all. A change in X
will not necessarily result in a change in Y.
R squared is a goodness of fit for a formula (usually a straight line)
to the data. In many ways it is similar to the correlation
coefficient. However since R squared is always a range of zero to one,
you don't get an indication if the relationship is direct (R > 0) or
inverse (R < 0). Another factor you may see is "Adjusted R Squared"
which takes into account the number of data points. If you have a
subset of the total data, the accuracy of the result will generally be
less.
To answer your second question, let me provide an example. Let's say
there is a company making a product where each unit takes 3 units of A
and 7 units of B to produce. The cost to produce the product based on
the cost of A and B is something like...
A B Cost
1 1 10
2 1 13
3 2 23
1 2 17
and so on. As you can see, the changes in the cost of A only
contributes to 0.3 of the change in cost of the product. If you had a
number of such values, the R value should be 0.3 for A and 0.7 for B.
So, using this table, you get predictions such as
1/.3 = 3.33...
2/.3 = 6.66...
3/.3 = 10
and so on. So the formula you included at the end of your question is
incorrect.
There is a technique (multiple linear regression) which is more
appropriate to use with this kind of example.
For more information, I suggest a few sites including:
http://helios.bto.ed.ac.uk/bto/statistics/tress11.html
A pretty good set of examples and includes descriptions of using the
tools in Excel.
http://www.uvm.edu/~dhowell/gradstat/psych340/Lectures/CorrelReg/correl1.html
Another site with good examples, perhaps a little more technical. I
am including it because it talks about the difference between
correlation and regression.
http://www-micro.msb.le.ac.uk/2060/2060-4.html
Another good explanation of how correlation and regression are
different. Good illustrations as well.
http://mathworld.wolfram.com/CorrelationCoefficient.html
A number of equations, but the charts showing different R squared
values are helpful.
http://www.csuchico.edu/psy/Examples.htm
Examples using SPSS (data and output provided).
http://bmj.com/statsbk/11.shtml
More examples, this time showing +1 / -1 correlation, non-linear
regression, defines a number of other statistical measures
Helpful search phrases included:
compare correlation regression
define "r squared" "multiple r"
define "multiple r" correlation
--Maniac |
Clarification of Answer by
maniac-ga
on
16 Aug 2003 05:24 PDT
Hello Keithp,
Based on what you described, a six variable model with an Adjusted R
of 0.3, the model is weak. Selecting the right variables and
relationships can be done on an iterative basis. There are a few
interesting reports on line including:
http://149.43.80.142/cs100homeS01/EPearson/Project/drunk%20driving%20factors.html
A project on Drunk Driving looking at four factors that contribute
to drunk driving deaths. It steps through the analysis of each
variable and then the combination to come to the conclusions
presented.
http://www.eeb.uconn.edu/Courses/EEB200/eeb200s01/stobutzki.htm
A study of fish / swimming speeds. It talks about the poor results
achieved and suggests further tests to correct the model. For example,
some of the fish may have been swimming a great [or less] distance
prior to capture, resulting in fatigue and poor performance.
Another aspect that may be considered is that the relationship between
two variables is not linear. It is often helpful to chart the result
against each independent variable. For example if Y is your result and
X1 through X6 are the independent variables, make six plots similar to
those in
http://bmj.com/statsbk/11.shtml
to see if the relationship is not linear. There are a number of
relationships in the real world that are not linear including:
- weight vs. height
- aerodynamic drag vs. speed [can be extremely non-linear]
If you see one of these, there are techniques to perform regression
with non-linear formulas.
Another method that may be helpful is to select subsets of data and
determine the correlation of these subsets. If a chart shows bands of
data (the model I described would do that), then you can compute the
coefficients for each subset. The problem is now to determine what
factor (indepdent variable) is causing the differences. Additional
experiments may help, but you may also have the data you need if you
can determine the other factor (and assign values to each data point)
to do further analysis.
--Maniac
|