Hello lifeafterdeath!
Here are the answers to your questions:
Question 1
----------
* Draw up a frequency table with tally chart for it.
Time taken Tally Freq. Freq. in %
110 | 1 10%
113 | 1 10%
115 || 2 20%
118 | 1 10%
119 | 1 10%
120 |||| 4 40%
* Explain what are 'mean , median, and mode'. Find these averages for
this part of the data.
Mean: "The mean is the sum of all the scores divided by the number of
scores [...] The mean is a good measure of central tendency for
roughly symmetric distributions but can be misleading in skewed
distributions since it can be greatly influenced by extreme scores.
Therefore, other statistics such as the median may be more informative
for distributions such as reaction time or family income that are
frequently very skewed."
Mean
http://www.ruf.rice.edu/~lane/hyperstat/A15885.html
Median: "The median is the middle of a distribution: half the scores
are above the median and half are below the median. The median is less
sensitive to extreme scores than the mean and this makes it a better
measure than the mean for highly skewed distributions. The median
income is usually more informative than the mean income, for example."
Median
http://www.ruf.rice.edu/~lane/hyperstat/A27533.html
Mode: "The mode is the most frequently occurring score in a
distribution and is used as a measure of central tendency. The
advantage of the mode as a measure of central tendency is that its
meaning is obvious. Further, it is the only measure of central
tendency that can be used with nominal data."
Mode
http://www.ruf.rice.edu/~lane/hyperstat/A10032.html
For the 10 observations provided above, the mean, median and mode are
the following:
(110+113+115+115+118+119+120+120+120+120)
Mean= -----------------------------------------
10
= 117 seconds
Median:
"When there is an odd number of numbers, the median is simply the
middle number. For example, the median of 2, 4, and 7 is 4.
When there is an even number of numbers, the median is the mean of the
two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is
(4+7)/2 = 5.5"
In this case, there is an even number of observations (ten). If we
sort the numbers from lower to greater, we get the following list:
110 113 115 115 118 119 120 120 120 120
It can be easily seen in this list that the two middle numbers are 118
and 119. Thus, the MEDIAN is equal to 118.5 seconds
Mode:
As I said before, the mode is the most frequently observed number.
Looking at the frequency table, you can see that 120 is the number
most frequently observed (it's observed 4 times). Thus, the mode is
120 seconds.
* Calculate the standard deviation of the data .
If you need information on how the standard deviation is computed,
please check the following pages, as writing formulas is a bit messy
here.
Standard Deviation and Variance
http://davidmlane.com/hyperstat/A16252.html
http://davidmlane.com/hyperstat/A40397.html
So, first we compute the sample variance for the ten observations,
using the formula provided in the sites shown above.
Variance = 12.666...
Now, the standard deviation is simply the square root of the variance,
so we find that:
Standard deviation = 3.559... seconds
Question 2
----------
* Draw up the frequency table .
Here is the frequency table for all the observations
----------------------
Time | Freq.
----------+-----------
100 | 1
110 | 4
112 | 4
113 | 2
114 | 6
115 | 10
116 | 7
117 | 4
118 | 3
119 | 2
120 | 6
139 | 1
----------------------
* Find mean, mode and median for the whole set of data.
Now that you know how to compute these three numbers, I'll simply
write the solution and not the procedure.
Mean = 115.58 seconds
Median = 115 seconds
Mode = 115 seconds
* Compare these averages with the averages of part one of this
question and draw your conclusion .
I'll summarize here the results so far:
Sample w/10 observations Full data
Mean 117 115.58
Median 115 115
Mode 120 115
As you can see, the results obtained using the sample with ten
observations is quite similar to the one obtained using all
observations. This means (and is usually the case for most data sets)
that a small sample is usually enough to obtain good approximations to
the population mean, median and mode. It's important to know this,
because it's obviously easier to do computations with 10 observations
rather than with 50 observations. This fact is used extensively by
people doing "serious" statistics. Take, for example, the companies
that measure how many people watch a TV show. They don't ask every
household in the country what shows they are watching. Rather, they
randomly take a relatively small number of households, ask them this
information, and draw conclusions for the whole country based on a
small portion of it.
* From the above calculations, what is the most probable time taken to
complete a query.
This is a difficult question to answer without more information. Let's
see what are the options.
Let's assume first that time is not "continuous", as it seems to be
the case from the data you show. We would be assuming here that you
can only take an integer number of seconds to complete the query. In
this case, I would say the most probable time taken to complete the
query is the mode, that is, 115 seconds. This is the time that most
students have taken, so one could think that it's the most probable
time taken.
Things change if you consider time to be continuous (that is, it's
possible to take 113 seconds, or 112.38, or 120.298483 up to an
infinite number of decimals). In this case, the probability of taking
any single time to complete the query is zero. When assuming
"continuous" time, you can ask what's the probability of taking, say,
between 112.3 and 115.82 seconds, and in this case it will be a
positive number. However, if you ask "what's the probability of taking
exactly 120 seconds?" the the answer is zero. Thus, there is no "most
probable time taken". All times happen with probability zero. However,
in this case, one could ask: "If I had to predict the number of
seconds the students will take, what number should I guess in order to
minimize the sum of 'mistakes' of my prediction?"
The "mistakes" of the prediction are usually computed as the sum of
the squares of the differences between the observed values and the
predicted value. In this case, the number that minimizes this sum of
mistakes is precisely the mean, that is, 115.58 seconds.
Deviations from the mean and median
http://www.ruf.rice.edu/~lane/hyperstat/A41417.html
* Define total range, upper and lower quartile, also inter-quartile
range of the data.
All the definitions used for this question were taken from the
following page
5-Number Summary
http://www.si.umich.edu/libhelp/toolkit/analyze5numSummBoxplot.html
Interquartile range
http://hades.ph.tn.tudelft.nl/Internal/PHServices/Documentation/MathWorld/math/math/i/i170.htm
"The lower and upper range are the lowest and highest values in the
data set"
Thus,
Lower Range = 100 seconds
Upper Range = 139 seconds
Total Range = 139 - 100 = 39 seconds
"The lower quartile is the median of the lower half of the data set
and the upper quartile is the median of the upper half of the data
set"
Thus,
Lower Quartile = 114 seconds
Upper Quartile = 117 seconds
As you can see in the page provided above, the interquartile range is
basically upper quartile minus the lower quartile, so:
Inter-quartile Range = 3 seconds
Question 3
----------
The definition of the class width is the following: |
Clarification of Answer by
elmarto-ga
on
09 Jun 2003 12:09 PDT
I'm sorry, I didn't paste the answer correctly. It's missing the last
part. Here it is:
Question 3
----------
The definition of the class width is the following:
"[The class width is the] difference between two consecutive lower
class limits or lower class boundaries", while the definition of
"class limits" or "class boundaries" is "numbers used to separate
classes"
Elementary Statistics
http://www.ec.erau.edu/cce/faculty/baty2/211A_erau/211A_day1/211_day1_lecture2.ppt
Unfortunately, the page is in PowerPoint format. In case you don't
have PowerPoint, I'll explain: when drawing a grouped frequency table,
you first have to define what these "groups" or "classes" are. In this
question, the data is grouped as "100-104", "105-109", "110-114", etc.
For the grouped frequency table, you'll then count how many
observations fall in the "100-104" class, how many fall in the
"105-109" class, etc.
So, returning to the class width, the lower class boundaries are 100,
105, 110, etc, as these are the numbers that separate the classes.
Now, the class width is simply the difference between two consecutive
lower class boundaries, so, the class width is 105-100 = 5.
Note: even though you have only given two classes (100-104 and
105-109), I assumed the the following classes will be 110-114,
115-119, 120-124, 125-129, 130-134, 135-139. I've made this assumption
because it's common practice, when drawing grouped frequency tables,
to use classes with equal class width.
* Draw the grouped frequency distribution table.
In order to do this, we have to see how many observation fall in each
class. For example, in class 100-104, we find that there is only 1
observation (100), while in class 105-109 there are none. By doing
this with all classes, we obtain:
Class Freq. Freq. in %
100-104 1 2 %
105-109 0 0 %
110-114 16 32 %
115-119 26 52 %
120-124 6 12 %
125-129 0 0 %
130-134 0 0 %
134-139 1 2 %
-------------------------------------
I've also found that I forgot to specify the frequency in percentage
in Question 2. Here is the table again:
---------------------------------------
Time | Freq. Freq. in %
----------+----------------------------
100 | 1 2 %
110 | 4 8 %
112 | 4 8 %
113 | 2 4 %
114 | 6 12%
115 | 10 20%
116 | 7 14%
117 | 4 8 %
118 | 3 6 %
119 | 2 4 %
120 | 6 12%
139 | 1 2 %
---------------------------------------
There. I hope these answers were clear enough. If you still have any
questions, just request a clarification, I will be more than happy to
clarify anything you need. Otherwise, I await your comments and final
rating.
Best wishes!
elmarto
|