Google Answers: running variance expression

View Question

Q: running variance expression ( Answered 5 out of 5 stars

Question

Subject: running variance expression
Category: Computers > Algorithms
Asked by: lusus-ga
List Price: $3.50

Posted: 14 Nov 2002 12:55 PST
Expires: 14 Dec 2002 12:55 PST
Question ID: 107855

I think the expression for a running mean/average in
iterative/pseudocode form would be something like:

x = new data value
sum = sum + x
count = count +1
running_mean = sum/count
repeat...

what would be the iterative/pseudocode expression for a running
variance?
by 'running' I mean that I have a list that I periodically append to,
and I cannot store all the previous values.  for running mean, I don't
store x[1], x[2], etc. just thier sum and the count.  a link to
straight answers like this would be appreciated.  I get hundreds of
hits on every math topic I try, but no pages get right to the point of
stuff like the question above.

Clarification of Question by lusus-ga on 14 Nov 2002 13:38 PST

I'm under the impression that the answer to this is perfectly well
known and a knowledgable person would know off the top of their head
or would know right where to look.  if you know you can provide the
answer, but it will require real work, give me some idea of the effort
required so I can price it better.

Answer

Subject: Re: running variance expression
Answered By: rbnn-ga on 14 Nov 2002 13:54 PST
Rated: 5 out of 5 stars

Thank you for the question. As always, if you have any questions or need additional information, please use the "Request Clarification" button on the browser before rating this answer. The variance of a random variable X is the E(X^2)-E(X)^2 . (see e.g. http://mathworld.wolfram.com/Variance.html). Therefore, a running variance can be computed simply by using running means of the original data and the squares of the original data. Below is a Java program that shows the implementation in a method called "variances" that returns the running variances of an array of data. The test program generates an array of data from a uniform distribution, whose variance should be 1/12, ( http://mathworld.wolfram.com/UniformDistribution.html ) . Here is a sample script: %javac Variance.java % java Variance 100 Expected variance of: 0.08333333333333333 got variance of: 0.0840287590180841 % java Variance 1000 Expected variance of: 0.08333333333333333 got variance of: 0.08806610259355246 % java Variance 10000 Expected variance of: 0.08333333333333333 got variance of: 0.08249038698065755 % java Variance 1000000 Expected variance of: 0.08333333333333333 got variance of: 0.0833870225438409 Here is the code (note: sometimes code does not render properly. If this is a problem, I can put it on a website). public class Variance{ public static double[] variances(double[] data){ double sum=0; double sumsquare=0; int len=data.length; double mean=0; double[] variances=new double[len]; for (int i=0;i<len;++i){ sum+=data[i]; sumsquare+=data[i]data[i]; mean=sum/(i+1); variances[i]=sumsquare/(i+1)-meanmean; } return variances; } public static void main(String[] args){ int len=100; if (args.length>0)len=Integer.parseInt(args[0]); double[]data=new double[len]; for (int i=0;i<len;++i) data[i]=Math.random(); double[]variances=variances(data); System.out.println("Expected variance of: "+1/12.+" got variance of: "+variances[len-1]); } } SEARCH STRATEGY: ---------------- variance uniform distribution variance
Clarification of Answer by rbnn-ga on 14 Nov 2002 14:06 PST One quick point that I forgot to mention. Java, like C and C++, has 0-based indexing in arrays; while your pseudocode, like Fortran, Matlab, and a lot of pseudocode, uses 1-based indexing on arrays; in any case, this is the reason that 1 is added to the denominator in the expressions for the mean and mean of the squares in the code given.
Request for Answer Clarification by lusus-ga on 14 Nov 2002 15:50 PST so, let me just see if I've got this right (expressed as I'm used to seeing it) running variance = 1/N * E(x^2) - ( E(x)/n )^2 (where E means summation) or: v = <x^2> - <x>^2 so I could use: x = new data value sum = sum + x sumofsquares = sumofsquares + x^2 count = count +1 running_mean = sum/count running_variance = 1/count * sumofsquares - running_mean^2 repeat... ?
Clarification of Answer by rbnn-ga on 14 Nov 2002 16:31 PST Your analysis is correct. However, it is not a good idea to use the symbol E for summation, since in probability theory E means expectation, or mean (what you use double angle brackets for).

lusus-ga rated this answer: 5 out of 5 stars

and gave an additional tip of: $2.50

Comments

Subject: Re: running variance expression
From: rbnn-ga on 14 Nov 2002 13:29 PST

I think the price might be a little too low.

Subject: Re: running variance expression
From: lusus-ga on 14 Nov 2002 13:34 PST

do you mean that it's a difficult question?

Subject: Re: running variance expression
From: racecar-ga on 14 Nov 2002 14:16 PST

Would it work to just add a couple lines to your running mean code?

sum_var = sum_var + (x - running_mean)^2
running_var = sum_var/count

Subject: Re: "add a couple lines"
From: poincare-ga on 14 Nov 2002 19:28 PST

The problem with

  sum_var = sum_var + (x - running_mean)^2

is that running_mean changes at each iteration, so all of the previous
summands to sum_var would actually have to be re-calculated.  The code
with these two lines added would still give a reasonable approximation
to the true variance at each stage, as long as the data come in an
unsorted order, but the approach given by rbnn gives the correct
variance at each stage.  The sum of the squares is independent of the
order in which the data are given, and records enough information to
compute the variance because of linearity:

sum(  (x_i - mean)^2 )
= sum(  x_i^2 ) - sum( -2*x_i*mean) + sum(mean^2)
= sum(x_i^2) - 2*mean*sum(x_i) + mean*mean*count
= sum(x_i^2) - mean*mean*count

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy