Thank you for the question.
As always, if you have any questions or need additional information,
please use the "Request Clarification" button on the browser before
rating this answer.
The variance of a random variable X is the
E(X^2)-E(X)^2 .
(see e.g. http://mathworld.wolfram.com/Variance.html).
Therefore, a running variance can be computed simply by using running
means of the original data and the squares of the original data.
Below is a Java program that shows the implementation in a method
called "variances" that returns the running variances of an array of
data.
The test program generates an array of data from a uniform
distribution, whose variance should be 1/12, (
http://mathworld.wolfram.com/UniformDistribution.html ) .
Here is a sample script:
%javac Variance.java
% java Variance 100
Expected variance of: 0.08333333333333333 got variance of:
0.0840287590180841
% java Variance 1000
Expected variance of: 0.08333333333333333 got variance of:
0.08806610259355246
% java Variance 10000
Expected variance of: 0.08333333333333333 got variance of:
0.08249038698065755
% java Variance 1000000
Expected variance of: 0.08333333333333333 got variance of:
0.0833870225438409
Here is the code (note: sometimes code does not render properly. If
this is a problem, I can put it on a website).
public class Variance{
public static double[] variances(double[] data){
double sum=0;
double sumsquare=0;
int len=data.length;
double mean=0;
double[] variances=new double[len];
for (int i=0;i<len;++i){
sum+=data[i];
sumsquare+=data[i]*data[i];
mean=sum/(i+1);
variances[i]=sumsquare/(i+1)-mean*mean;
}
return variances;
}
public static void main(String[] args){
int len=100;
if (args.length>0)len=Integer.parseInt(args[0]);
double[]data=new double[len];
for (int i=0;i<len;++i)
data[i]=Math.random();
double[]variances=variances(data);
System.out.println("Expected variance of: "+1/12.+" got variance of:
"+variances[len-1]);
}
}
SEARCH STRATEGY:
----------------
variance
uniform distribution variance |
Clarification of Answer by
rbnn-ga
on
14 Nov 2002 14:06 PST
One quick point that I forgot to mention.
Java, like C and C++, has 0-based indexing in arrays; while your
pseudocode, like Fortran, Matlab, and a lot of pseudocode, uses
1-based indexing on arrays; in any case, this is the reason that 1 is
added to the denominator in the expressions for the mean and mean of
the squares in the code given.
|
Request for Answer Clarification by
lusus-ga
on
14 Nov 2002 15:50 PST
so, let me just see if I've got this right (expressed
as I'm used to seeing it)
running variance = 1/N * E(x^2) - ( E(x)/n )^2
(where E means summation)
or: v = <x^2> - <x>^2
so I could use:
x = new data value
sum = sum + x
sumofsquares = sumofsquares + x^2
count = count +1
running_mean = sum/count
running_variance = 1/count * sumofsquares - running_mean^2
repeat...
?
|
Clarification of Answer by
rbnn-ga
on
14 Nov 2002 16:31 PST
Your analysis is correct. However, it is not a good idea to use the
symbol E for summation, since in probability theory E means
expectation, or mean (what you use double angle brackets for).
|