An interesting question. I enjoyed researching it because I did not
find the answer I expected.
The answer to your question is no. The prediction error is not a
linear function of the correlation coefficient. In fact, one has to
perform a statistical calculation to even determine if two correlation
coefficients differ significantly since there is an error factor
associated with the calculation of the correlation coefficient for
each population derived from sampling. So, the correlation
coefficient explains the linear relationship between two variables for
a sample of a given population, but it is not usually useful for
comparing samples of two different populations.
I hope you find my answer satisfactory. Relevant links and quotations
appear below.
Wonko
http://www.tufts.edu/~gdallal/corr.htm
This is a great link showing how wildly different data sets can have
the same correlation coefficient.
"How do values of the correlation coefficient correspond to different
data sets? As the correlation coefficient increases in magnitude, the
points become more tightly concentrated about a straight line through
the data. Two things should be noted. First, correlations even as high
as 0.6 don't look that different from correlations of 0. I want to say
that correlations of 0.6 and less don't mean much if the goal is to
predict individual values of one variable from the other. The
prediction error is nearly as great as we'd get by ignoring the second
variable and saying that everyone had a value of the first variable
equal to the overall mean! However, I'm afraid that this might be
misinterpreted as suggesting that all such associations are worthless.
They have important uses that we will discuss in detail when we
consider linear regression. Second, although the correlation can't
exceed 1 in magnitude, there is still a lot of variability left when
the correlation is as high as 0.99."
http://www.mega.nu:8080/ampp/rummel/uc.htm#C2
Understanding Correlation by R.J. Rummel, Honolulu: Department of
Political Science, University of Hawaii, 1976
"Seldom, indeed, will a correlation be zero or perfect. Usually, the
covariation between things will be something like .43 or -.16. How are
we to interpret such correlations? Clearly .43 is positive, indicating
positive covariation; -.16 is negative, indicating some negative
covariation. Moreover, we can say that the positive correlation is
greater than the negative. But, we require more than. If we have a
correlation of .56 between two variables, for example, what precisely
can we say other than the correlation is positive and .56?
From my derivation of the correlation coefficient in the last chapter,
we know that the squared correlation (Definition 3.3) describes the
proportion of variance in common between the two variables. If we
multiply this by 100 we then get the percent of variance in common
between two variables. That is:
r2jk x 100 = percent of variance in common between Xj and Xk.
For example, we found that the correlation between a nation's power
and its defense budget was .66. This correlation squared is .45, which
means that across the fourteen nations constituting the sample 45
percent of their variance on the two variables is in common (or 55
percent is not in common). In thus squaring correlations and
transforming covariance to percentage terms we have an easy to
understand meaning of correlation. And we are then in a position to
evaluate a particular correlation.
As a matter of routine it is the squared correlations that should be
interpreted. This is because the correlation coefficient is misleading
in suggesting the existence of more covariation than exists, and this
problem gets worse as the correlation approaches zero....
Note that as the correlation r decrease by tenths, the r2 decreases by
much more. A correlation of .50 only shows that 25 percent variance is
in common; a correlation of .20 shows 4 percent in common; and a
correlation of .10 shows 1 percent in common (or 99 percent not in
common). Thus, squaring should be a healthy corrective to the tendency
to consider low correlations, such as .20 and .30, as indicating a
meaningful or practical covariation."
http://www.medcalc.be/manual/mpage08-06.html
"In the example a correlation coefficient of 0.86 (sample size = 42)
is compared with a correlation coefficient of 0.62 (sample size = 42).
The resulting z-statistic is 2.5097, which is associated with a
P-value of 0.0140. Since this P-value is less than 0.05, it is
concluded that the two correlation coefficients differ significantly."
http://fonsg3.let.uva.nl/Service/Statistics/Two_Correlations.html
"Check whether you really want to know whether the correlation
coefficients are different. Only rarely is this a useful question."
Here is why I think the correlation coefficient is relevant to your
question:
http://www.medcalc.be/manual/mpage06-03a.html
"Correlation coefficient with P-value. The correlation coefficient is
a number between -1 and 1. In general, the correlation expresses the
degree that, on an average, two variables change correspondingly.
If one variable increases when the second one increases, then there is
a positive correlation. In this case the correlation coefficient will
be closer to 1. For instance the height and age of children are
positively correlated.
If one variable decreases when the other variable increases, then
there is a negative correlation and the correlation coefficient will
be closer to -1.
The P-value is the probability that you would have found the current
result if the correlation coefficient were in fact zero (null
hypothesis). If this probability is lower than the conventional 5%
(P<0.05) the correlation coefficient is called statistically
significant.
It is, however, important not to confuse correlation with causation.
When two variables are correlated, there may or may not be a causative
connection, and this connection may moreover be indirect. Correlation
can only be interpreted in terms of causation if the variables under
investigation provide a logical (biological) basis for such
interpretation.
95% confidence interval (CI) for the correlation coefficient: this is
the range of values that contains with a 95% confidence the 'true'
correlation coefficient." |