Google Answers Logo
View Question
 
Q: running variance expression ( Answered 5 out of 5 stars,   4 Comments )
Question  
Subject: running variance expression
Category: Computers > Algorithms
Asked by: lusus-ga
List Price: $3.50
Posted: 14 Nov 2002 12:55 PST
Expires: 14 Dec 2002 12:55 PST
Question ID: 107855
I think the expression for a running mean/average in
iterative/pseudocode form would be something like:

x = new data value
sum = sum + x
count = count +1
running_mean = sum/count
repeat...

what would be the iterative/pseudocode expression for a running
variance?
by 'running' I mean that I have a list that I periodically append to,
and I cannot store all the previous values.  for running mean, I don't
store x[1], x[2], etc. just thier sum and the count.  a link to
straight answers like this would be appreciated.  I get hundreds of
hits on every math topic I try, but no pages get right to the point of
stuff like the question above.

Clarification of Question by lusus-ga on 14 Nov 2002 13:38 PST
I'm under the impression that the answer to this is perfectly well
known and a knowledgable person would know off the top of their head
or would know right where to look.  if you know you can provide the
answer, but it will require real work, give me some idea of the effort
required so I can price it better.
Answer  
Subject: Re: running variance expression
Answered By: rbnn-ga on 14 Nov 2002 13:54 PST
Rated:5 out of 5 stars
 
Thank you for the question.

As always, if you have any questions or need additional information,
please use the "Request Clarification" button on the browser before
rating this answer.

The variance of a random variable X is the 

E(X^2)-E(X)^2 .

(see e.g. http://mathworld.wolfram.com/Variance.html).

Therefore, a running variance can be computed simply by using running
means of the original data and the squares of the original data.

Below is a Java program that shows the implementation in a method
called "variances" that returns the running variances of an array of
data.

The test program generates an array of data from a uniform
distribution, whose variance should be 1/12, (
http://mathworld.wolfram.com/UniformDistribution.html ) .

Here is a sample script:

%javac Variance.java

% java Variance 100
Expected variance of: 0.08333333333333333 got variance of:
0.0840287590180841

% java Variance 1000
Expected variance of: 0.08333333333333333 got variance of:
0.08806610259355246

% java Variance 10000
Expected variance of: 0.08333333333333333 got variance of:
0.08249038698065755

% java Variance 1000000
Expected variance of: 0.08333333333333333 got variance of:
0.0833870225438409


Here is the code (note: sometimes code does not render properly. If
this is a problem, I can put it on a website).

public class Variance{
    public static double[] variances(double[] data){
	double sum=0;
	double sumsquare=0;
	int len=data.length;
	double mean=0;
	double[] variances=new double[len];
	for (int i=0;i<len;++i){
	    sum+=data[i];
	    sumsquare+=data[i]*data[i];
	    mean=sum/(i+1);
	    variances[i]=sumsquare/(i+1)-mean*mean;
	}
	return variances;
    }

    public static void main(String[] args){
	int len=100;
	if (args.length>0)len=Integer.parseInt(args[0]);
	double[]data=new double[len];
	for (int i=0;i<len;++i)
	    data[i]=Math.random();
	double[]variances=variances(data);
	System.out.println("Expected variance of: "+1/12.+" got variance of:
"+variances[len-1]);
    }
}

SEARCH STRATEGY:
----------------
variance
uniform distribution variance

Clarification of Answer by rbnn-ga on 14 Nov 2002 14:06 PST
One quick point that I forgot to mention.

Java, like C and C++, has 0-based indexing in arrays; while your
pseudocode, like Fortran, Matlab, and a lot of pseudocode, uses
1-based indexing on arrays; in any case, this is the reason that 1 is
added to the denominator in the expressions for the mean and mean of
the squares in the code given.

Request for Answer Clarification by lusus-ga on 14 Nov 2002 15:50 PST
so, let me just see if I've got this right (expressed
as I'm used to seeing it)

running variance = 1/N * E(x^2) - ( E(x)/n )^2

(where E means summation)

or:  v = <x^2> - <x>^2


so I could use: 

x = new data value 
sum = sum + x 
sumofsquares = sumofsquares + x^2
count = count +1 
running_mean = sum/count 
running_variance = 1/count * sumofsquares - running_mean^2
repeat... 

?

Clarification of Answer by rbnn-ga on 14 Nov 2002 16:31 PST
Your analysis is correct. However, it is not a good idea to use the
symbol E for summation, since in probability theory E means
expectation, or mean (what you use double angle brackets for).
lusus-ga rated this answer:5 out of 5 stars and gave an additional tip of: $2.50

Comments  
Subject: Re: running variance expression
From: rbnn-ga on 14 Nov 2002 13:29 PST
 
I think the price might be a little too low.
Subject: Re: running variance expression
From: lusus-ga on 14 Nov 2002 13:34 PST
 
do you mean that it's a difficult question?
Subject: Re: running variance expression
From: racecar-ga on 14 Nov 2002 14:16 PST
 
Would it work to just add a couple lines to your running mean code?

sum_var = sum_var + (x - running_mean)^2
running_var = sum_var/count
Subject: Re: "add a couple lines"
From: poincare-ga on 14 Nov 2002 19:28 PST
 
The problem with

  sum_var = sum_var + (x - running_mean)^2

is that running_mean changes at each iteration, so all of the previous
summands to sum_var would actually have to be re-calculated.  The code
with these two lines added would still give a reasonable approximation
to the true variance at each stage, as long as the data come in an
unsorted order, but the approach given by rbnn gives the correct
variance at each stage.  The sum of the squares is independent of the
order in which the data are given, and records enough information to
compute the variance because of linearity:

sum(  (x_i - mean)^2 )
= sum(  x_i^2 ) - sum( -2*x_i*mean) + sum(mean^2)
= sum(x_i^2) - 2*mean*sum(x_i) + mean*mean*count
= sum(x_i^2) - mean*mean*count

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy