Google Answers Logo
View Question
 
Q: Standard deviations of correlated random variables ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Standard deviations of correlated random variables
Category: Science > Math
Asked by: djlewis2-ga
List Price: $15.00
Posted: 29 Jan 2004 18:09 PST
Expires: 28 Feb 2004 18:09 PST
Question ID: 301650
I know the formula for combining the standard deviation (sd) of two
normally distributed random variables added together when they are (a) perfectly
independent (sqrt(sd1^2 + sd2^2)); (b) perfectly correlated (sd1 +
sd2).

More generally, is there a simple formula for the combination of sd1 and sd2
based on their correlation coefficient (which is zero for independent
random variables and +-1 for perfectly correlated random variables,
but could be anywhere in between -1 and +1).
Answer  
Subject: Re: Standard deviations of correlated random variables
Answered By: livioflores-ga on 30 Jan 2004 00:10 PST
Rated:5 out of 5 stars
 
Hi djlewis2!!

VARIANCE:

The variance of a random variable x is defined as:

var(x) = E[(x - E(x))^2]  ,

where E(x) is the expectation value of the random variable x.
For a definition of expectation value see the following page:
http://planetmath.org/encyclopedia/ExpectedValue.html

A useful formula that follows inmediately from the definition is that:

var(x) = E[x^2] - (E[x])^2

"The variance of a random variable determines a level of variation of
the possible values of  around its mean. However, as this measure is
squared, the standard deviation is used instead when one wants to talk
about how much a random variable varies around its expected value."
http://planetmath.org/encyclopedia/Variance.html


STANDARD DEVIATION (std):

The standard deviation is a measure of the variation of x around the
expected value, it is defined as the square root of the variance:

std(x) = sqrt(var(x))


COVARIANCE and Statistical Correlation of two variables:

Given two ramdom variables x,y with mean mx and my respectively, the
covariance cov(x,y) is defined by:

cov(x,y) = E[(x - mx).(y -my)] 
         = E(x.y) - E(x).E(y)

As you know if x and y are independent, then:
cov(x,y) = 0  (because E(x.y) = E(x).E(y) )      [1]

In the other hand, if x and y are completely dependent we have:
cov(x,y) = sqrt (var(x).var(y)) = std(x).std(y)  [2]  

From [1] and [2] we can induct a measure of the variables' dependance,
that we will call the Statistical Correlation of x and y, and it is
given by:

cor(x,y) = cov(x,y) / (std(x).std(y))  

This means that the covariance of two variates x and y provides a
measure of how strongly correlated these variables are.

----------------------------------------------------------

For two variables, the covariance is related to the variance by:

var(x+y) = var(x) + var(y) + 2.cov(x,y)
         
In effect:
var(x+y) = E[(x+y)^2] - (E(x+y))^2
         = E(x^2+y^2+2xy) - [E(x)+E(y)]^2
         = E(x^2+y^2+2x.y) - E(x)^2 - E(y)^2 - 2.E(x).E(y)
         = var(x) + var(y) +2.E(x.y) - 2.E(x).E(y)
         = var(x) + var(y) + 2.cov(x,y)

Then we have:

var(x+y) = var(x) + var(y) + 2.cor(x,y).std(x).std(y) [3]

Now we can relate easily cor(x,y), std(x+y), std(x) and std(y) by
using the definition:
std(x) = sqrt(var(x)) 

or the equivalent 

var(x) = std(x)^2

Now from [3] we have:

std(x+y)^2 = std(x)^2 + std(y)^2 + 2.cor(x,y).std(x).std(y)

or

std(x+y) = sqrt[std(x)^2 + std(y)^2 + 2.cor(x,y).std(x).std(y)]

This is the formula that you whant!!

In the particular case that cor(x,y) = 0 (x,y are independent) we have that:

std(x+y) = sqrt[std(x)^2 + std(y)^2

If cor(x,y) = 1 (x,y are perfectly correlated), then:

std(x+y) = sqrt[std(x)^2 + std(y)^2 + 2.std(x).std(y)]
         = sqrt[(std(x) + std(y))^2]
         = std(x) + std(y)  
         
------------------------------------------------------------

For additional reference see the following pages:
"Expectation, (co-)variance, and correlation":
http://www.met.rdg.ac.uk/cag/courses/Stats/course/node34.html

"Variance":
http://mathworld.wolfram.com/Variance.html

"Covariance":
http://mathworld.wolfram.com/Covariance.html

"Statistical Correlation":
http://mathworld.wolfram.com/StatisticalCorrelation.html

"STANDARD DEVIATION AND CORRELATION EXAMPLE":
http://www.bus.duq.edu/faculty/lundberg/281/Standev.doc  

-------------------------------------------------------------

I hope this helps; but if you find something unclear and/or a missing
point, please, just use the clarification request feature to let me
know and I will gladly respond for your requirement as soon as
possible.

Best regards.
livioflores-ga

Clarification of Answer by livioflores-ga on 30 Jan 2004 20:23 PST
Hi!!

Thank you for the good rating and the generous tip!!!

Regarding to the first part of your rating comment, I suggest you to
see at the following page for the paragraph that starts with:
"For multiple variables, the variance is given using the definition of
covariance...", then shows a general formula for var(X1+X2+...+Xn):
http://mathworld.wolfram.com/Variance.html


The second part of your comment is unclear to me, if you can clarify
it I will be glad to offer assistance.

Regards,
livioflores-ga

Request for Answer Clarification by djlewis2-ga on 31 Jan 2004 05:15 PST
Thanks for the formula for adding multiple, correlated variances/std
devs. However, that does not work very well in a spreadsheet.  I think
one only has to show that var(x+y+z) = var((x+y)+z). And that does
follow from the formula, since it basically says

var(x+y+z) = var(x)+var(y)+var(z)+2*(cov(x,y)+cov(x,z)+cov(y,z))

or similarly with squares and a square root for stddev(assuming the
mysterious m in the formula is really N).

And that's symmetric in x, y, z, so the variance is associative. So, I
only can design a spreadsheet to calculate the variance (or stddev) of
a column of variances (or stddevs) cumulatively in a single column. 
The formula with the triangular double sum would be much trickier in a
spreadsheet.

As for the comment about trigonometry, it seems that cor(x,y) plays
the role of -cos(x,y) in the law of cosines,

c^2 = a^2 + b^2 - 2ab * cos(angle(a,b))

which gives the length of the opposite side of any triangle given two
sides a, b and the angle(a,b) between them.  In other words, the
"angle" between two random variables is their correlation, or rather
the cosine of the angle: a right angle is perfect orthogonality, an
angle of 180 is perfect correlation/linearity, and the
stddevs/variances "stack", and angle of 0 is perfect
anti-correlation/linear cancellation.

See, for example:

http://sep.stanford.edu/sep/prof/waves/rnd/paper_html/node22.html

I'd love to see a good reference which explains in detail why
trigonometry and normally distributed random variables are so closely
related.

Clarification of Answer by livioflores-ga on 02 Feb 2004 08:20 PST
Hi again!!

I am not sure about the trigonometric relationship, it seems, for me,
that the 2cov(x,y) plays the role of the 2ab terms of the Expansion of
(a + b)^2:
(a + b)^2 = a^2 + b^2 + 2ab

Compare that with var(x+y):
var(x+y) = var(x) + var(y) + 2.cov(x,y)

If the above formula is written in terms of the std, we have a more
suggestive result:
std(x+y)^2 = std(x)^2 + std(y)^2 + 2.cov(x,y)


Walking this way we have for x+y+z :

(a + b + c)^2 =  a^2 + b^2 +c^2 + 2ab + 2ac + 2bc

then, after replacing as I suggest, we have:

var(X+Y+Z) = var(X) + var(Y) +Var(Z)+ 2cov(X,Y)+  2cov(X,Z)+ 2cov(Y,Z)

that is the correct formula!!!

I hope this helps you.

Request for Answer Clarification by djlewis2-ga on 02 Feb 2004 09:13 PST
Thanks again.  I think it's not so much a trig issue as that a
normally distributed random variable apparently can be represented as
a vector. The length of the vector seems to be the sd, but I'm not
sure where the mean comes in. The angle between two such vectors is
the correlation coefficient.  I don't have a good intuitive grasp of
what's going on, but it's noit improtant right now. Thanks for your
insights.

As for var & sd of a sum of random variables, here is the VBA code I
came up with to do it in MS Access.  So, you see I've really put this
to use, and it's giving a very nice, intuitive result (for the level
of confidence in estimates of programming work for a software
development project, using a correlation of 0.8 among estimates)

Thanks.  --David.

Function CorSD(X As Single, Group As String, Cor As Single) As Single
    Static VTot As Single
    Static LastGroup As String
    
    If Group <> LastGroup Then
        VTot = 0
        LastGroup = Group
    End If
    VTot = VTot + (X * X) + (2 * Cor * X * VTot ^ 0.5)
    CorSD = VTot ^ 0.5
End Function

Clarification of Answer by livioflores-ga on 02 Feb 2004 19:27 PST
It sounds interesting, I promise you to see around to find something
related to your ideas.

Best regards and good luck with your project!!!
livioflores-ga
djlewis2-ga rated this answer:5 out of 5 stars and gave an additional tip of: $5.00
Great answer, livioflores... perfect.  I'll use it.  (I do have to
work out the case for n variables rather than two, but I assume that
is easy to do in a spreadsheet, using associativity, correct? (like
(sd(x+y+z) = sd((x+y)+z))

I'm adding a tip not only because it is a good answer with good
references, but because I'd love a pointer to someplace that explains
the geometric/trigonometric connection with all this.  After all, we
are certainly looking at hypoteneuses, cosines and such here, right?

Thanks. --David Lewis

Comments  
Subject: Re: Standard deviations of correlated random variables
From: emcy-ga on 29 Jan 2004 20:24 PST
 
covariance (a,b) = std dev a x std dev b x correlation of a and b.
Subject: Re: Standard deviations of correlated random variables
From: emcy-ga on 29 Jan 2004 20:39 PST
 
ps.  It might be a bit difficult to just accept the formula I gave
you, so at this page:
http://www.sportsci.org/resource/stats/correl.html, you'll find a
variation of the very same formula, except that it defines the
correlation coefficient instead of the covariance.  If you flip the
std deviation terms to the left side of the equation, you'll get the
same equation I gave you above. :)

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy