View Question
 Question
 Subject: running variance expression Category: Computers > Algorithms Asked by: lusus-ga List Price: \$3.50 Posted: 14 Nov 2002 12:55 PST Expires: 14 Dec 2002 12:55 PST Question ID: 107855
 ```I think the expression for a running mean/average in iterative/pseudocode form would be something like: x = new data value sum = sum + x count = count +1 running_mean = sum/count repeat... what would be the iterative/pseudocode expression for a running variance? by 'running' I mean that I have a list that I periodically append to, and I cannot store all the previous values. for running mean, I don't store x[1], x[2], etc. just thier sum and the count. a link to straight answers like this would be appreciated. I get hundreds of hits on every math topic I try, but no pages get right to the point of stuff like the question above.``` Clarification of Question by lusus-ga on 14 Nov 2002 13:38 PST ```I'm under the impression that the answer to this is perfectly well known and a knowledgable person would know off the top of their head or would know right where to look. if you know you can provide the answer, but it will require real work, give me some idea of the effort required so I can price it better.```
 Subject: Re: running variance expression Answered By: rbnn-ga on 14 Nov 2002 13:54 PST Rated:
 ```Thank you for the question. As always, if you have any questions or need additional information, please use the "Request Clarification" button on the browser before rating this answer. The variance of a random variable X is the E(X^2)-E(X)^2 . (see e.g. http://mathworld.wolfram.com/Variance.html). Therefore, a running variance can be computed simply by using running means of the original data and the squares of the original data. Below is a Java program that shows the implementation in a method called "variances" that returns the running variances of an array of data. The test program generates an array of data from a uniform distribution, whose variance should be 1/12, ( http://mathworld.wolfram.com/UniformDistribution.html ) . Here is a sample script: %javac Variance.java % java Variance 100 Expected variance of: 0.08333333333333333 got variance of: 0.0840287590180841 % java Variance 1000 Expected variance of: 0.08333333333333333 got variance of: 0.08806610259355246 % java Variance 10000 Expected variance of: 0.08333333333333333 got variance of: 0.08249038698065755 % java Variance 1000000 Expected variance of: 0.08333333333333333 got variance of: 0.0833870225438409 Here is the code (note: sometimes code does not render properly. If this is a problem, I can put it on a website). public class Variance{ public static double[] variances(double[] data){ double sum=0; double sumsquare=0; int len=data.length; double mean=0; double[] variances=new double[len]; for (int i=0;i0)len=Integer.parseInt(args[0]); double[]data=new double[len]; for (int i=0;i - ^2 so I could use: x = new data value sum = sum + x sumofsquares = sumofsquares + x^2 count = count +1 running_mean = sum/count running_variance = 1/count * sumofsquares - running_mean^2 repeat... ?``` Clarification of Answer by rbnn-ga on 14 Nov 2002 16:31 PST ```Your analysis is correct. However, it is not a good idea to use the symbol E for summation, since in probability theory E means expectation, or mean (what you use double angle brackets for).```

 `I think the price might be a little too low.`
 `do you mean that it's a difficult question?`
 ```Would it work to just add a couple lines to your running mean code? sum_var = sum_var + (x - running_mean)^2 running_var = sum_var/count```
 ```The problem with sum_var = sum_var + (x - running_mean)^2 is that running_mean changes at each iteration, so all of the previous summands to sum_var would actually have to be re-calculated. The code with these two lines added would still give a reasonable approximation to the true variance at each stage, as long as the data come in an unsorted order, but the approach given by rbnn gives the correct variance at each stage. The sum of the squares is independent of the order in which the data are given, and records enough information to compute the variance because of linearity: sum( (x_i - mean)^2 ) = sum( x_i^2 ) - sum( -2*x_i*mean) + sum(mean^2) = sum(x_i^2) - 2*mean*sum(x_i) + mean*mean*count = sum(x_i^2) - mean*mean*count```