Google Answers Logo
View Question
 
Q: data set anaylysis ( No Answer,   3 Comments )
Question  
Subject: data set anaylysis
Category: Science
Asked by: tferer-ga
List Price: $5.00
Posted: 05 May 2004 09:48 PDT
Expires: 04 Jun 2004 09:48 PDT
Question ID: 341517
i have a huge set of numbers (trip times from point a to b) most of
the numbers fall into a "normal" range ....occasionally, when there is
a delay, trip times go up for a short period. how do i find the number
that represents a "normal" un-delayed trip?
Answer  
There is no answer at this time.

Comments  
Subject: Re: data set anaylysis
From: pctyszka-ga on 05 May 2004 13:03 PDT
 
just a comment:
Depending on what you are using the "normal" number for, I would try
calculating the average

(A+B+C+...+Z)/(# of numbers)

or find the median number by sorting all the numbers from shortest
time to longest time and then finding the middlemost (# of numbers)/2
number:
Subject: Re: data set anaylysis
From: prssurcookr-ga on 06 May 2004 18:24 PDT
 
Calculating a mean and standard deviation for your population would be
most informative.
Subject: Re: data set anaylysis
From: tobytyler-ga on 08 May 2004 07:10 PDT
 
I have a huge set of numbers (trip times from point a to b) most of
the numbers fall into a "normal" range ....occasionally, when there is
a delay, trip times go up for a short period. how do i find the number
that represents a "normal" un-delayed trip?
 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The way I read the problem is that you have to 
Step (1) Determine which values are outliers (the delayed time values)
Step (2) reject those from your data set
Step (3) find the average of the rest 

My high school mathematics book suggests that outliers may be either
(A) more than twice the interquartile range from the median; or
(B) more than 2.5 times the standard deviation from the mean (for continuous data).

If you have discrete data such as travel times, 
it might be easiest to use (A) twice the interquartile range.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
An example problem:

If my travel data is
12 13 13 14 15 15 16 18 18 19 24 25 27 43 44 hours

Step (1)
The median = 18 (the middle score)
Q1 = 14 (the middle score of the bottom-half)
Q3 = 25 (the middle score of the top half)

The interquartile range is 
Q3 - Q1 = 25 - 14 = 11

Step (2)
We reject data which is greater than 
the median + 2 * upper-interquartile range
= 18 + 2*11
= 40

We are left with
12 13 13 14 15 15 16 18 18 19 24 25 27 hours

Step (3)
The median of this is 16.

++++++++++++++++++++
Extras
1) If you have an even number of scores 
the median is the average of the middle two scores:
e.g. The median of (1 5 6 9) is (5+6)/2 = 5.5

2) It might be easy to automatically organize the data 
from smallest to greatest using a spreadsheet program.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy