Hi djlewis2!!
VARIANCE:
The variance of a random variable x is defined as:
var(x) = E[(x - E(x))^2] ,
where E(x) is the expectation value of the random variable x.
For a definition of expectation value see the following page:
http://planetmath.org/encyclopedia/ExpectedValue.html
A useful formula that follows inmediately from the definition is that:
var(x) = E[x^2] - (E[x])^2
"The variance of a random variable determines a level of variation of
the possible values of around its mean. However, as this measure is
squared, the standard deviation is used instead when one wants to talk
about how much a random variable varies around its expected value."
http://planetmath.org/encyclopedia/Variance.html
STANDARD DEVIATION (std):
The standard deviation is a measure of the variation of x around the
expected value, it is defined as the square root of the variance:
std(x) = sqrt(var(x))
COVARIANCE and Statistical Correlation of two variables:
Given two ramdom variables x,y with mean mx and my respectively, the
covariance cov(x,y) is defined by:
cov(x,y) = E[(x - mx).(y -my)]
= E(x.y) - E(x).E(y)
As you know if x and y are independent, then:
cov(x,y) = 0 (because E(x.y) = E(x).E(y) ) [1]
In the other hand, if x and y are completely dependent we have:
cov(x,y) = sqrt (var(x).var(y)) = std(x).std(y) [2]
From [1] and [2] we can induct a measure of the variables' dependance,
that we will call the Statistical Correlation of x and y, and it is
given by:
cor(x,y) = cov(x,y) / (std(x).std(y))
This means that the covariance of two variates x and y provides a
measure of how strongly correlated these variables are.
----------------------------------------------------------
For two variables, the covariance is related to the variance by:
var(x+y) = var(x) + var(y) + 2.cov(x,y)
In effect:
var(x+y) = E[(x+y)^2] - (E(x+y))^2
= E(x^2+y^2+2xy) - [E(x)+E(y)]^2
= E(x^2+y^2+2x.y) - E(x)^2 - E(y)^2 - 2.E(x).E(y)
= var(x) + var(y) +2.E(x.y) - 2.E(x).E(y)
= var(x) + var(y) + 2.cov(x,y)
Then we have:
var(x+y) = var(x) + var(y) + 2.cor(x,y).std(x).std(y) [3]
Now we can relate easily cor(x,y), std(x+y), std(x) and std(y) by
using the definition:
std(x) = sqrt(var(x))
or the equivalent
var(x) = std(x)^2
Now from [3] we have:
std(x+y)^2 = std(x)^2 + std(y)^2 + 2.cor(x,y).std(x).std(y)
or
std(x+y) = sqrt[std(x)^2 + std(y)^2 + 2.cor(x,y).std(x).std(y)]
This is the formula that you whant!!
In the particular case that cor(x,y) = 0 (x,y are independent) we have that:
std(x+y) = sqrt[std(x)^2 + std(y)^2
If cor(x,y) = 1 (x,y are perfectly correlated), then:
std(x+y) = sqrt[std(x)^2 + std(y)^2 + 2.std(x).std(y)]
= sqrt[(std(x) + std(y))^2]
= std(x) + std(y)
------------------------------------------------------------
For additional reference see the following pages:
"Expectation, (co-)variance, and correlation":
http://www.met.rdg.ac.uk/cag/courses/Stats/course/node34.html
"Variance":
http://mathworld.wolfram.com/Variance.html
"Covariance":
http://mathworld.wolfram.com/Covariance.html
"Statistical Correlation":
http://mathworld.wolfram.com/StatisticalCorrelation.html
"STANDARD DEVIATION AND CORRELATION EXAMPLE":
http://www.bus.duq.edu/faculty/lundberg/281/Standev.doc
-------------------------------------------------------------
I hope this helps; but if you find something unclear and/or a missing
point, please, just use the clarification request feature to let me
know and I will gladly respond for your requirement as soon as
possible.
Best regards.
livioflores-ga |
Clarification of Answer by
livioflores-ga
on
30 Jan 2004 20:23 PST
Hi!!
Thank you for the good rating and the generous tip!!!
Regarding to the first part of your rating comment, I suggest you to
see at the following page for the paragraph that starts with:
"For multiple variables, the variance is given using the definition of
covariance...", then shows a general formula for var(X1+X2+...+Xn):
http://mathworld.wolfram.com/Variance.html
The second part of your comment is unclear to me, if you can clarify
it I will be glad to offer assistance.
Regards,
livioflores-ga
|
Request for Answer Clarification by
djlewis2-ga
on
31 Jan 2004 05:15 PST
Thanks for the formula for adding multiple, correlated variances/std
devs. However, that does not work very well in a spreadsheet. I think
one only has to show that var(x+y+z) = var((x+y)+z). And that does
follow from the formula, since it basically says
var(x+y+z) = var(x)+var(y)+var(z)+2*(cov(x,y)+cov(x,z)+cov(y,z))
or similarly with squares and a square root for stddev(assuming the
mysterious m in the formula is really N).
And that's symmetric in x, y, z, so the variance is associative. So, I
only can design a spreadsheet to calculate the variance (or stddev) of
a column of variances (or stddevs) cumulatively in a single column.
The formula with the triangular double sum would be much trickier in a
spreadsheet.
As for the comment about trigonometry, it seems that cor(x,y) plays
the role of -cos(x,y) in the law of cosines,
c^2 = a^2 + b^2 - 2ab * cos(angle(a,b))
which gives the length of the opposite side of any triangle given two
sides a, b and the angle(a,b) between them. In other words, the
"angle" between two random variables is their correlation, or rather
the cosine of the angle: a right angle is perfect orthogonality, an
angle of 180 is perfect correlation/linearity, and the
stddevs/variances "stack", and angle of 0 is perfect
anti-correlation/linear cancellation.
See, for example:
http://sep.stanford.edu/sep/prof/waves/rnd/paper_html/node22.html
I'd love to see a good reference which explains in detail why
trigonometry and normally distributed random variables are so closely
related.
|
Clarification of Answer by
livioflores-ga
on
02 Feb 2004 08:20 PST
Hi again!!
I am not sure about the trigonometric relationship, it seems, for me,
that the 2cov(x,y) plays the role of the 2ab terms of the Expansion of
(a + b)^2:
(a + b)^2 = a^2 + b^2 + 2ab
Compare that with var(x+y):
var(x+y) = var(x) + var(y) + 2.cov(x,y)
If the above formula is written in terms of the std, we have a more
suggestive result:
std(x+y)^2 = std(x)^2 + std(y)^2 + 2.cov(x,y)
Walking this way we have for x+y+z :
(a + b + c)^2 = a^2 + b^2 +c^2 + 2ab + 2ac + 2bc
then, after replacing as I suggest, we have:
var(X+Y+Z) = var(X) + var(Y) +Var(Z)+ 2cov(X,Y)+ 2cov(X,Z)+ 2cov(Y,Z)
that is the correct formula!!!
I hope this helps you.
|
Request for Answer Clarification by
djlewis2-ga
on
02 Feb 2004 09:13 PST
Thanks again. I think it's not so much a trig issue as that a
normally distributed random variable apparently can be represented as
a vector. The length of the vector seems to be the sd, but I'm not
sure where the mean comes in. The angle between two such vectors is
the correlation coefficient. I don't have a good intuitive grasp of
what's going on, but it's noit improtant right now. Thanks for your
insights.
As for var & sd of a sum of random variables, here is the VBA code I
came up with to do it in MS Access. So, you see I've really put this
to use, and it's giving a very nice, intuitive result (for the level
of confidence in estimates of programming work for a software
development project, using a correlation of 0.8 among estimates)
Thanks. --David.
Function CorSD(X As Single, Group As String, Cor As Single) As Single
Static VTot As Single
Static LastGroup As String
If Group <> LastGroup Then
VTot = 0
LastGroup = Group
End If
VTot = VTot + (X * X) + (2 * Cor * X * VTot ^ 0.5)
CorSD = VTot ^ 0.5
End Function
|
Clarification of Answer by
livioflores-ga
on
02 Feb 2004 19:27 PST
It sounds interesting, I promise you to see around to find something
related to your ideas.
Best regards and good luck with your project!!!
livioflores-ga
|