View Question
 Question
 Subject: Basic statistics Category: Science Asked by: lauren411-ga List Price: \$20.00 Posted: 18 Oct 2005 21:24 PDT Expires: 17 Nov 2005 20:24 PST Question ID: 582026
 ```In statistics, why do we use the square root of the average of the squared deviations (i.e., the standard deviation) rather than the average of the absolute values of the deviations? I tried looking this up on Google, and I found a lot of relevant sites, but amazingly, every one of them said, either literally or in essence, "We COULD use the average of the absolute deviations, but a better approach is to use the standard deviation." That's great, but I want to know WHY this is better.```
 Subject: Re: Basic statistics Answered By: livioflores-ga on 20 Oct 2005 07:15 PDT Rated:
 ```Hi laureen411!! The short answer to your question is that the Standard Deviation is more intuitive to handle it because it is in the same units than the raw data, that is the STD is in the same units as the mean, which is why STD is preferred rather than the variance to measure variabilty. See the following definition: "The Standard Deviation is the square root of the variance. Whether you use the Standard Deviation or the Variance is a matter of preference. Mathematicians tend to use the Variance as the more ?natural? unit. Engineers tend to prefer the Standard Deviation, because it is in the same units as whatever is being studied." "MiC Quality: Introduction to Statistics - Standard Deviation": http://www.margaret.net/statistics/mic-free/is10.htm There is a nice explanation that I found that shows why we "feel" the STD as a more intuitive measure of the variability (and in some way it is a mathematic "demonstration" of this fact): Two reasons for the use of the STD are: (1) standard deviation is another kind of average distance to the mean and (2) standard deviation corresponds to actual ruler distance between the sample and the mean. The reason (1) is quite obvious, it comes from the formula used to calculate it (that is, by definition, the square root of the Variance). To explain the reason (2) you must note that the Standard Deviation can be directly related to the Euclidean geometric distance between the sample and its mean. In effect, the square of the distance between two points, one with coordinates xi, i = 1,2,...,n, the other with coordinates yi, i = 1,2,...,n, is: (x1-y1)^2 + (x2-y2)^2 + . . . + (xn-yn)^2. If you replace each y by the sample mean, the result is the variance's numerator. The standard deviation is just the square root of the variance. So the standard deviation is the ordinary distance between the sample and its mean, divided by the square root of n-1. Summed up from the Measures of Variation paragraph of "Notes to Accompany Chapter 3: Numerical Data Description" by Prof. Stanley L. Sclove - University of Illinois at Chicago - College of Business Administration: http://www.uic.edu/classes/mba/mba503/981/ntsch03.htm#5 For additional reference see the following paragraph: "The idea of the variance is straightforward: it is the average of the squares of the deviations of the observations from their mean. The details we have just presented, however, raise some questions. Why do we square the deviations? Why not just average the distances of the observations from their mean? There are two reasons, neither of them obvious. First, the sum of the squared deviations of any set of observations from their mean is the smallest that the sum of squared deviations from any number can possibly be. This is not true of the unsquared distances. So squared deviations point to the mean as center in a way that distances do not. Second,the standard deviation turns out to be the natural measure of spread for a particularly important class of symmetric unimodal distributions, the normal distributions. We will meet the normal distributions in the next section. We commented earlier that the usefulness of many statistical procedures is tied to distributions of particular shapes. This is distinctly true of the standard deviation. Why do we emphasize the standard deviation rather than the variance? One reason is that s, not s^2, is the natural measure of spread for normal distributions. There is also a more general reason to prefer s to s^2. Because the variance involves squaring the deviations, it does not have the same unit of measurement as the original observations. The variance of the metabolic rates, for example, is measured in squared calories. Taking the square root remedies this. The standard deviations measures spread about the mean in the original scale." From "Reading from book on Standard Deviation" at Terry Berna's Math Page at Souhegan High School: http://www.sprise.com/shs/terryberna/Reading%20-%20Standard%20deviation.pdf Search strategy: "use standard deviation because" "standard deviation because" "standard deviation rather than the variance" "standard deviation instead" "standard deviation instead of the variance" I hope that this helps you. Feel free to request for a clarification if you need it. Regards, livioflores-ga``` Clarification of Answer by livioflores-ga on 20 Oct 2005 13:27 PDT ```Hi!! Yes, I missed the point, I am sorry for that. The answer in this case is probably less intutive than the "instead of variance" one. And I think that you can find it at the suggested text for references: "3.5.3.1. The Mean Absolute Deviation: A way to measure the variability or spread of a set of numbers is by computing their average distance to the mean, called the Mean Absolute Deviation. The distances from the mean are the absolute values |xi - m|, i = 1,2,...,n. The Mean Absolute Deviation (M.A.D.) is their ordinary arithmetic average. Usually we use the Standard Deviation instead. Two reasons for this are: (1) the standard deviation is another kind of average distance to the mean and (2) the standard deviation corresponds to actual ruler distance between the sample and the mean. " From the Measures of Variation paragraph of "Notes to Accompany Chapter 3: Numerical Data Description" by Prof. Stanley L. Sclove - University of Illinois at Chicago - College of Business Administration: http://www.uic.edu/classes/mba/mba503/981/ntsch03.htm#5 At this point we can follow with the "Euclidean Ruler" explanation given in my previous answer: The reason (1) is quite obvious, it comes from the formula used to calculate it (that is, by definition, the square root of the Variance). To explain the reason (2) you must note that the Standard Deviation can be directly related to the Euclidean geometric distance between the sample and its mean. In effect, the square of the distance between two points, one with coordinates xi, i = 1,2,...,n, the other with coordinates yi, i = 1,2,...,n, is: (x1-y1)^2 + (x2-y2)^2 + . . . + (xn-yn)^2. If you replace each y by the sample mean, the result is the variance's numerator. The standard deviation is just the square root of the variance. So the standard deviation is the ordinary distance between the sample and its mean, divided by the square root of n-1. Summed up from the Measures of Variation paragraph of "Notes to Accompany Chapter 3: Numerical Data Description" by Prof. Stanley L. Sclove - University of Illinois at Chicago - College of Business Administration: http://www.uic.edu/classes/mba/mba503/981/ntsch03.htm#5 At this point another reference given becomes relevant: "Why do we square the deviations? Why not just average the distances of the observations from their mean? There are two reasons, neither of them obvious. First, the sum of the squared deviations of any set of observations from their mean is the smallest that the sum of squared deviations from any number can possibly be. This is not true of the unsquared distances. So squared deviations point to the mean as center in a way that distances do not. Second,the standard deviation turns out to be the natural measure of spread for a particularly important class of symmetric unimodal distributions, the normal distributions." From "Reading from book on Standard Deviation" at Terry Berna's Math Page at Souhegan High School: http://www.sprise.com/shs/terryberna/Reading%20-%20Standard%20deviation.pdf The above explain why variance is preferred than average absolute value deviation, since you now know why is STD used rather than variance, the conclusion is obvious. But there are more; I found a text which discuss a related topic and in the final paragraph states: "In short, variance is a more powerful concept than MAD, because predictions about population parameters can be made from sample data. And, like those steak knives, there is even more: there is a theorem, the variance theorem, which shows that variances of independent (uncorrelated) variables are additive. This powerful idea underpins regression and analysis of variance. MAD is not additive, and hence it is a much less useful concept in the structure of statistics." From "I'm Not Mad About MAD" (copyright Education Queensland) http://exploringdata.cqu.edu.au/docs/why_var2.doc Finally at MathForum I found several explanations that could be useful to you at the thread "Standard Deviation vs. Mean Absolute Deviation": At the botton of this first page you will see links to the answers given by claimed experts on the topic. http://mathforum.org/kb/message.jspa?messageID=3986517&tstart=0 Just in case there are the links to the replies: Re: Standard Deviation vs. Mean Absolute Deviation - Teague, Dan http://mathforum.org/kb/message.jspa?messageID=3987571&tstart=0 Re: Standard Deviation vs. Mean Absolute Deviation - dennis roberts http://mathforum.org/kb/message.jspa?messageID=3987781&tstart=0 Re: Standard Deviation vs. Mean Absolute Deviation - Olsen Chris http://mathforum.org/kb/message.jspa?messageID=3991285&tstart=0 I hope that this helps you now. If you still find the answer wrong and missed from the topic you have the right to request a refund by emailing the editors. Again excuse my misundertanding on the original question. Regards, livioflores-ga```
 lauren411-ga rated this answer: ```Very poor answer. It doesn't address the central issue in the question: why use standard deviation instead of average absolute value deviation? Instead, the answer talks about why we use standard deviation instead of VARIANCE. That's an easy question, and not the one I asked.```

 Subject: Re: Basic statistics From: iang-ga on 19 Oct 2005 15:07 PDT
 ```One problem with working with deviations is they can cancel each other out - the average of +240 and -240 is 0, which isn't helpful information if you're about to stick your fingers into a mains socket! Working with the roots of the squares gets rid of the negatives and allows you to focus on the "size" of the numbers. There may well be other reasons of course! Ian G.```
 Subject: Re: Basic statistics From: pforcelli-ga on 19 Oct 2005 15:36 PDT
 ```Average absolute value of deviations would work; but the reason we square the deviations, is so that it can be visualized geometrically as the area of a sqaure. Strange I know, but true.```
 Subject: Re: Basic statistics From: flyinghippo-ga on 20 Oct 2005 09:04 PDT
 ```Lauren411, The problem of deviations cancelling each other (as they always will if you compute deviations from the mean) can be solved by considering their absolute values (something I believe you mentioned in your question). So, in the first example (by Ian G.) -240 and +240 will not cancel each other but will give you a mean absolute deviation of 240. The problem with absolute deviations is that you can not do much with them mathematically. For example, if you have a protfolio of two stocks and you know that they move completely independently, the variance of your portfolio (as a measure of the risk you're taking) will be simply a sum of variances of those two stocks. No such simple formula exist for absolute deviations - so, you are stuck squaring them. Consider also Chebyshev's inequality: in any population, if you know its standard deviation, you know that at least this much 1-(1/k)^2 of the whole population is within +/- k standard deviations from the mean. In other words, If you know the scores on an exam average 60 points and the standard deviation is 10 points, you know that at least 3/4 of all people got between 60-2*10=40 and 60+2*10=80 points (regardless of how the distribution of the scores looks). In special cases, like the popular Normal distribution (a.k.a. the Gaussian Curve), you know exactly how much of the curve's area is within so many standard deviations from the mean. You can also calculate this area for other distributions, which comes handy in calculating all kinds of probabilities: from the risk of a company's defaulting on its loans to the odds of a child having a disease if you find a certain mutation. None of these convenient calculations exist for the absolute deviations. Squaring the numbers is not too high of a price to pay for being able to do a lot of useful calculation with the result. By the way, you don't have to subtract each point from the mean before you square the difference. There is a neat shortcut for that. Square each value as it is (not subtracting the average). Get the mean value of those squares, then subtract square of your population's mean. For example, your numbers are 1, 3, 4, 5, 7 * First, you calculate the mean (1+3+4+5+7)/5=4 * Then you are supposed to find a difference of each individual number from the mean. Don't waiste your time! Just square each number as it is: 1, 9, 16, 25, 49 and sum those squares 1+9+16+25+49=100 * Square the mean of your original readings 4^2=16 and multiply by the number of readings 16*5=80 Subtract that from the sum of squares you computed before 100-80=20 * If you want the population variance, divid this number by the number of readings 20/5=4 If you want the sample variance, divide it by (n-1) 20/(5-1)=5 This is your variance - you just accomplished this with one subtraction instead of five! If you have a larger number of measurements, this trick will save you a lot of time. Good luck, FlyingHippo```
 Subject: Re: Basic statistics From: bozo99-ga on 20 Oct 2005 16:34 PDT
 ```Variance has useful algebra related to it. V = SUM( (x{i} - x{av} )^2 ) = SUM ( ( ... expand the squared bit ...) ) = SUM ( x{i}^2 ) - n.x{av}^2 unless I've fluffed my algebra remembered from ages ago. I speculate that before calculators it was easier to get a standard deviation by summing both a column of x{i} and a column of x{i}^2 (and a short finishing step) than by calculating the mean and then doing a series of subtractions and additions. (My guess is this is the main reason.) Also algebraically you can think of the contribution of one data point without knowing in advance whether it is above or below the mean and whether you have to multiply that contribution by -1.```
 Subject: Re: Basic statistics From: llcooldl-ga on 23 Oct 2005 06:51 PDT
 ```I think the "Answer" to this question was perfect, it explains exactly why standard deviation is used!```
 Subject: Re: Basic statistics From: mrmoto-ga on 25 Oct 2005 01:46 PDT
 ```Some of the answers and comments so far have been a bit confusing. As a graduate student in math, I hope to be able to give some clarification. There is a strong relationship between mean and standard deviation, and analagously between median and sum of absolute deviations (from the median). Short Answer: standard deviation - is very easy to calculate, and to work with in formulas - has all sorts of "nice" properties (e.g. see http://en.wikipedia.org/wiki/Standard_deviation) - is the "best" measure of distance from the average, _if_ the data has a normal distribution average of absolute values of deviations - is a bit cumbersome and harder to manipulate algebraically - is more robust with respect to outliers - is more appropriate to calculate with respect to deviations from median Longer Answer: Some people have noted that it's easier to work with squares than absolute values. This is true, but there's more to it than that. Consider the notion of "average". If you want the average of a set of n numbers, the standard approach is to use the _arithmetic mean_; it usually provides a good idea of "average", so long as the numbers have a normal distribution. If there are many outliers, on the other hand, the arithmetic mean will give a distorted representation of the numbers. In this case, the _median_ is usually more appropriate. For example, let's say you have the following data: 0,1,1,1,2,2,2,3,24. The arithmetic mean is 4, so in fact all of the numbers except 24 are "below average" -- this is a bit unsettling. The median is 2, which is (probably) more meaningful here. Now, to come to the point about deviation. Using the numbers from the above example, the deviations from the mean are -4,-3,-3,-3,-2,-2,-2,-1,20. The standard deviation is approximately 7 -- you can see that the presence of the 24 has skewed not only the mean but also the standard deviation. (You probably wouldn't think of describing the numbers as "four, plus or minus 7".) So should we use the average of the absolute values of the deviations from the mean instead? Not necessarily -- if you do so, you're implicitly agreeing that the mean is appropriate, which it isn't here. But, just to see what happens: the average of the absolute deviations from the mean is about 4.5. The average of the absolute deviations from the _median_, on the other hand, is about 3. (The distances to the median are -2,-1,-1,-1,0,0,0,1,22) The reason why you might want to calculate the sum of absolute deviations from the median rather than the mean is as follows: * The mean is the number x that minimises the sum of the squares of deviations from x (i.e., it gives the best standard deviation) * The median is the number x that minimises the sum of the absolute values of the deviations from x (and therefore gives the best average of these absolute values) Here's another related example. Suppose you're trying to find a "best fit line" through some points. Usually you would calculate the "least squares" fit, which minimises the sum of squares of deviances from the line. This method is good most of the time, but will be affected by outliers. A better approach when there are outliers is to find the line that minimises the greatest absolute distance from the line. This is entirely analogous to the above example. However, there's a bit of a catch -- and this is where squares of deviations truly come in handy. The former minimisation problem is very easy to solve -- you can often do it by hand for relatively small data, and most scientific calculators also have this capability. The latter problem is harder, because of the awkwardness of the absolute value function in calculus. It is still possible to solve, but requires iterative methods, or linear programming techniques. A good resource for this is R. Vanderbei's book on linear programming at: http://www.princeton.edu/~rvdb/LPbook/index.html (see Part 1, Chapter 12, which discusses mean, median, best-fit lines, etc.) In conclusion: the average of absolute values of deviations is cumbersome, but more robust when there are outliers. I hope this helps!```
 Subject: Re: Basic statistics From: benreaves-ga on 27 Oct 2005 00:07 PDT
 `The SD penalizes outliers more than the MAD does.`