|
|
Subject:
statistics, measure of significance
Category: Science > Math Asked by: lusus-ga List Price: $2.50 |
Posted:
30 Oct 2002 17:15 PST
Expires: 29 Nov 2002 17:15 PST Question ID: 93738 |
I have two lists of numbers that represent a diverse list of statistics taken at two different times (let's say network performance). the list is large and I want to highlight differences which are most likely to be significant or interesting. I do not have a large historical sample to base this on.. it can only be a function of the two data points. a straight difference is not good because large value stats have larger differences than smaller value stats. (a change from 1000 to 1100 appears more significant than 1 to 50.) a percent difference is not good because small value stats have erratic changes which are big in relative (percentage) terms. e.g. a change from 2 to 4 looks more significant than 170,000-180,000. I probably don't know how to state this properly, but large magnitude values have a tendency to hover around a typical value (like the size of raindrops) while small values can go from 0 to other small values such as 5, fairly easily. I was thinking of something like using the larger of the two values as the assumed magnitude, by which the significance of a difference is scaled down. am I making any sense? if so, there must be a very standard statistical way of saying this properly. I prefer a single value that could be scaled to a fixed range, e.g. from 0 to 100 so that I can adjust the threshold of "interesting". I just want to help highlight the values in these long lists, which are most worthy of inspection. | |
| |
|
|
There is no answer at this time. |
|
Subject:
Re: statistics, measure of significance
From: mathtalk-ga on 30 Oct 2002 19:25 PST |
From a true statistical point of view, no, it does not make sense. Let me make sure I understand the setup. At two different points in time, you take a large set of "measurements" on a complex system (network). For example, there might be a count of users logged in, the number of files open on a file server, the number of memory pages swapped out on a database server, etc. Almost all of the corresponding numbers at the two points in time differ. You ask for a way to know which differences are most likely to be worth noticing. It is not much a statistical problem, in so far as statistics deals with repeated measurements, because each distinct measurement is only taken twice. What you have is a modelling problem. You need a model or "hypothesis" relating all these varied measurements to help formulate a notion of whether a difference in measurements is significant or not. Distinguishing variations that are relatively large or small versus ones that are absolutely large or small is probably a good first step, but it is far from the whole story. Let's turn the question around and ask this. Suppose someone with a Crystal Ball could tell you unequivocally, these pair of measurements exhibit the most significant difference. What would you do with that information? How would you proceed? What "investigation" concerning the difference between those two numbers would be possible? Or worthwhile? Those are the sorts of issues that a "model" addresses. A model of human physiology, for example, tells us that a 10 percent variation in blood temperature is more significant than a 10 percent variation in blood sugar, and that a one uint change in blood pH is more significant that a one unit change in blood volume. So we would need to know more about the "context" of your measurements to decide whether a variation has significance or not, or more to the point, whether a pattern or "constellation" of changes in measurement indicates an underlying event of importance (e.g. a "viral" attack either in the human patient or on a network). regards, mathtalk-ga |
Subject:
Re: statistics, measure of significance
From: starrebekah-ga on 30 Oct 2002 22:14 PST |
Use a statistical computer program (such as SPSS) to convert the raw scores into z scores. You can then compare means, standard deviations, and do other statistical analysis. You can get a trial version of SPSS at www.spss.com Good Luck! -Rebekah PS - Here are instructions on exactly how to do this using SPSS: http://www.uoguelph.ca/~psystats/raw_to_z-score_conversions.htm |
Subject:
Re: statistics, measure of significance
From: starrebekah-ga on 30 Oct 2002 22:16 PST |
PPS - This program will also let you make graphs - which will help you see those 'outliers' (or values you think might be significant to look at for further investigation. Makes it a lot easier. -Rebekah |
Subject:
Re: statistics, measure of significance
From: lusus-ga on 31 Oct 2002 09:20 PST |
oh, and I am processing this with a program, but it's probably not going to be worth the effort in my situation if the expression is more than a single line. I'd like to be able to set an arbitrary threshold and say "show me the differences that rank > 80 out of a possible score of 100." |
Subject:
Re: statistics, measure of significance
From: rsquared-ga on 31 Oct 2002 18:36 PST |
I'm not sure I fully understand the question and I may be repeating what has already been said, but... Are you looking for what is known as a standard deviation? This is basically a measure of how much a set of data vary around the mean. It's fairly simple to compute; I'm sure there are computer programs that do it. You might even be able to use Excel - I don't know. If you want more info on standard deviation, do a Google search. You are bound to find more than you'd ever care to know! Good luck. |
Subject:
Re: statistics, measure of significance
From: douglas256-ga on 01 Nov 2002 01:01 PST |
As stated, since you are only wanting to compare two numbers, this is not a question of statistics but of a discrete derivative. Your first attempt, a_{i+1} - a_i, was not sufficient. Your second attempt, 200*|a_{i+1} - a_i|/(|a_{i+1}| + |a_i| + 1) worked fairly well, but gave too much weight to small values of a_i and not enough weight to large values of a_i. If you want the difference to be between [0,100] and be dependent on the relative size of a_i, a global maximum is needed. Then, you could use: 20000 * (2*max - |a_{i+1}| - |a_i|)/max * (|a_{i+1} - a_i|/(|a_{i+1}| + |a_i| + 1)). You could of course very the size weighting by (2*max - |a_{i+1}| - |a_i|)/max to either a fractional power (e.g. 1/2) or a positive power (e.g. 2). The higher power applying more weight to size and a lower power applying less weight to the size. It should be noted, that the constant 20000 would have to be varied if the power is changed. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |