Hi jabeda!
Although the student number is missing from your question, it can be
done anyway. The only question that will remain unanswered here, then,
will be A; but this one is very easy once you know the appropiate
student number. Fortunately, the rest of the questions can be answered
without knowledge of S.
First of all let me explain why knowledge of S is not necessary to
answer questions regarding spread or dispersion (or variance). An S
"at the beggining" of a two-digit number, basically adds S*100 to that
number. For example, if S=4, then S27 is 427, or, conversely,
S*100 + 27 = 4*100 + 27 = 400 + 27 = 427
So basically, different S's just move the data points, but don't
change their dispersion. The set (120,124,119) has exactly the same
dispersion as (820,824,819).
So here are the answers to questions B on. Since S is irrelevant, I'll
assume S=0. When you know the appropiate S, just plug it before the
numbers in the answers.
B) Without performing any calculation, it appears that Family 2 has
the least variability. The data points only vary between 27 and 33 (or
427 and 433, or whatever), while the data points in the other families
vary between 27 and 39 for family 1; and between 16 and 49 for family
3. As data points in a set become "closer" to each other, the
dispersion of this set becomes smaller. It's clear that a collection
of numbers between 27 and 33 are closer to each other than numbers
between 27 and 39, or 16 and 49. This is a good indicative that
dispersion is the smallest in familiy 2. For analog reasons, without
performing any calculations, family 3 appears to have the greatest
dispersion.
C) These are the variances and standard deviations for each family:
Variance Std. Dev
Family 1 37.2 6.09
Family 2 5.2 2.28
Family 3 139.6 11.81
Notice that these results are independent of S: the variance for
familiy 1 is 37.2 whether S is 0, 1, 8 or whatever. You can check this
result for yourself. The formulas I used to calculate these values are
given in the following page, along with an example on how to calculate
them.
Variance and Std. Dev.
http://davidmlane.com/hyperstat/A16252.html
http://davidmlane.com/hyperstat/A40397.html
Also check
Standard deviation
http://www.quickmba.com/stats/standard-deviation/
As you can see, these results reflect what one supposed before making
any calculation. It's clear now that family 2 has the smallest
dispersion (since its variance is smaller than the other two) and that
family 3 has the largest one. Once we look at the actual values for
variance and SD, this results become apparent. Regarding the units of
both measures, since the units of the original data were dollars ($),
we have that the unit of the variance is "square-dollars" and the unit
of the SD is plainly dollars, hence the fact that SD is much more used
than variance when presenting summary statistics.
The standard deviation is also useful in the following way: if we
assume that the data points come from a normal (Gaussian)
distribution, then we can assume that roughly 95% of the observations
fall within plus or minus two standard deviations around the mean.
This is consistent with the data presented here. Take for example
family 2, which has mean=30 and SD=2.28. This would imply that most
observations fall between approximately 25.4 (~30-2*2.28) and 34.6
(~30+2*2.28). In this case, ALL the data points for family 2 fall in
this range.
Both the variance and the standard deviation are useful in assessing
the dispersion of a set of data points. If the variance of a set is
greater than the variance of another set, then it will always be the
case that the SD of the first set is also greater than the SD of the
second one. So in both cases, the largest the variance or SD, the
largest the dispersion of the data points. The main advantage of the
SD is that it has the same units as the original data.
More information on the subject:
"The variance and the standard deviation are both measures of the
spread of the distribution about the mean. The variance is the nicer
of the two measures of spread from a mathematical point of view, but
as you can see from the algebraic formula, the physical unit of the
variance is the square of the physical unit of the data. For example,
if our variable represents the weight of a person in pounds, the
variance measures spread about the mean in squared pounds. On the
other hand, standard deviation measures spread in the same physical
unit as the original data, but because of the square root, is not as
nice mathematically. Both measures of spread are useful"
Mean, Variance, and Standard Deviation
http://www.fmi.uni-sofia.bg/vesta/Virtual_Labs/freq/freq2.html
D) Let's calculate it for family 1, the results for the other families
are analogous. Also, these results are again independent of S. The
data points for this family are
(27,39,22,36,31)
The mean is then (27+39+22+36+31)/5 = 31. Therefore, the sum of
deviations from the mean is:
(27-31)+(39-31)+(22-31)+(36-31)+(31-31)
= -4 + 8 + -9 + 5 + 0
= 0
If you calculate it in the same way, you'll see that this result is
also 0 for the other families. To see how this result is independent
from S, let's take S to be for example 6 and recalculate this. The
data points would be:
(627,639,622,636,631)
The mean would then be 631. And again,
(627-631)+(639-631)+(622-631)+(636-631)+(631-631)
= -4 + 8 + -9 + 5 + 0
= 0
Why is it 0 for all the sets? This is a property of the mean. This
result will be true for any data set. The intuition behind this is
that the mean is a measure for central tendency. Therefore, one would
expect that, of all the data points in the set, some of them are above
the mean, and some of them are below the mean. Since we want the mean
to measure "central" tendency, the mean is constructed in a way that
values below the mean are "counterweighted" with values above the
mean, thus by looking at the mean we get an idea of around which
number are the data points located.
E) The only 5-observation data set eith mean 20 and zero variance is
simply
(20, 20, 20, 20, 20)
Clearly the mean here is 20. A set having zero variance must imply
that all the observations are equal. That is 0-variance means no
dispersion at all, and the only sets with no dispersion at all are
sets that have only identical numbers. Mathematically, if you look at
the formula for the variance, you'll see that if even one of the data
points were different from the rest, it's impossible for the variance
to be 0. For example, the set:
(20,20,20,21,19)
The mean of this set is 20. However, its variance is:
(1/5)* [ (20-20)^2 + (20-20)^2 + (20-20)^2 + (21-20)^2 + (19-20)^2 ]
=(1/5)* [ 0 0 0 1 1 ]
=2/5
Google search strategy
variance "standard deviation"
://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=variance+%22standard+deviation%22
I hope this helps! If you find anything unclear about my answer,
please don't hesitate to request a clarification, so I can follow up.
Otherwise, I await your rating and final comments.
Best wishes!
elmarto |