Mathematically speaking we cannot make a guarantee of "within 10% of
actual" by using a sample size less than the whole population. Let me
give an extreme example to illustrate.
Suppose we have 299 people who can perform a task instantly and 1 who
takes 300 minutes. On average the task is performed in 1 minute then.
Now a sample of 30 people will either include the one "outlier" or
not, so the sample mean will either be 10 minutes (with probability
10%) or 0 minutes (with probability 90%). It will never happen (given
these admittedly contrived circumstances) that the sample mean is
within 10% of the actual mean.
So, we should try instead to work out a range of values (based on the
sample taken) which has a 90% likelihood of containing the actual
population mean. If the actual population were "normal" or roughly
so, then an estimation of the population's variance (taken from the
sample variance with adjustment) can be used to construct just such an
interval (symmetric about the sample mean).
A more careful claim about this approach would be, the interval it
produces will contain the population mean for 90% of the samples that
could be taken. It can't guarantee anything about the estimated mean
from anyone one particular sample, and the 90% is referred to as a
"confidence level" to remind us that it isn't really the same as
asserting a 90% probability from the observed sample.
Other methods, often called "robust" or "distribution-free", can be
used when we have reason to suspect the population distribution isn't
close to normal. My "binomial distribution" example shows some of the
characteristics at play. Unlike the normal distribution, which is
symmetric about its mean, that discrete distribution was heavily
"skewed", with all but one individual below average.
regards, mathtalk-ga |