Hello bran!!
Here we Go:
1- Statistic and parameter
A statistic is a quantity that is computed from the sample. That
means it is determinable. A parameter is a quantity that reflects the
entire population. We generally do not know its value, but the desire
to know it prompts us to select a sample and compute an approximate
value (statistic) for it.
The reason we sample is so that we might get an estimate for the
population we sampled from. If you measure the entire population and
calculate a value like a mean or average, we don't refer to this as a
statistic, we call it a parameter of the population.
------------------------------------------------------------------
2- Sample frame and population
Population is the entire group under study as specified by the
research objectives, e.g., Supermarkets that are part of chains
(Safeway, Publix, Woodman’s) located in Wisconsin. It should be
defined in specific terms, I.e., instead of saying “everybody who
might buy this product,” say “young adults of age 18-24 who live with
their parents.”
Sample and Sample Unit:
• A sample is a subset of the population that should represent the
entire group. A sample is said to be “representative” if, in fact, it
represents the entire population.
• A sample unit is the basic level of investigation, e.g., a college
student, a housewife, a purchasing agent, a supermarket, a sea-food
restaurant, a
bank, etc.
The sample frame is a master list of all the sample units in the
population.
If the population were defined as all chain supermarkets in Wisconsin,
then a list containing all such stores would be the sample frame. The
sample frame for a mall intercept survey would be all shoppers who
were walking through the mall on the days data were collected.
Sometimes a physical list is not possible.
To sum up:
Population: The set of all the relevant units of analysis defined by
the researcher.
Element or Sample unit: One unit from a population.
Sample Frame: A list of all the elements in the population from which
a sample may be drawn.
Sample: The set of elements selected or drawn from a sample frame to
represent the population.
PROPERTIES OF A GOOD SAMPLE FRAME
a. It contains a list of all of the elements in the population and
does not omit any (coverage is complete).
b. It does not contain any extraneous elements (only contains elements
relevant to the defined population).
c. It does not contain duplicates.
d. It is current and up-to-date
I found a little "practical" variation of this definition:
"Target Population --> Sample Frame:
The sample frame is the portion of the target population that is
accessible to researchers (e.g., persons who read newspapers, persons
with phones). Often, the sample frame is some sort of list (e.g., a
membership list). But individuals who are accessible may differ from
those who are not. For example, persons with phones are different from
persons without phones, and physicians who are members of professional
organizations are different from those who are not. Readers should
carefully judge how the sample frame might systematically differ from
the target population."
From "Primer on Interpreting Surveys" ACP-ASIM Journal Ene-Feb 2002
(take a look to this article, it is very interesting):
http://www.acponline.org/journals/ecp/janfeb02/primer_interpret_surveys.pdf
-------------------------------------------------------------------------
3- Restricted and unrestricted sampling
"The basic idea of sampling is that by selecting some of the elements
in a population, we may draw conclusions about the entire
population...
Element Selection: Whether the elements are selected individually and
directly from the population - viewed as a single pool - or whether
additional controls are placed on element, selection may also classify
samples. When each sample element is drawn individually from the
population at large, it is an unrestricted sample. Restricted sampling
covers all other forms of sampling."
From: "Sampling Issues- Part I" - Palm Beach Atlantic University -
by David M. Compton, Ph.D
http://faculty.pba.edu/comptond/callisto/courses/Quantitative_Methods/Sampling-1.html
------------------------------------------------------------------------
4 - Standard deviation and standard error
The standard deviation is one of several indices of variability that
statisticians use to characterize the dispersion among the measures in
a given population.
To calculate the standard deviation of a population it is first
necessary to calculate that population's variance (V). Numerically,
the standard deviation is the square root of the variance. Unlike the
variance, which is a somewhat abstract measure of variability, the
standard deviation can be readily conceptualized as a distance along
the scale of measurement.
SD = V^(1/2) = ((SUM(X-Xi)^2)/n)^(1/2)
The Standard Error, or Standard Error of the Mean, is an estimate of
the standard deviation of the sampling distribution of means, based on
the data from one or more random samples.
Numerically, it is equal to the square root of the quantity obtained
when s squared is divided by the size of the sample.
SE = (SD^2 / n)^(1/2) = SD/(n^1/2)
You will find interesting the following document:
"Maintaining Standards: Differences between the Standard Deviation and
Standard Error, and When to Use Each" by David L Streiner, PhD.
http://www.cpa-apc.org/Publications/Archives/PDF/1996/Oct/strein2.pdf
--------------------------------------------------------------
5- Simple random and complex random sampling
Simple Random Sampling:
Based on a numbered list of the population. Each person or document
has an equal chance of being selected.
The unrestricted, simple random sample is the simplest form of
probability sampling. Since all probability samples must provide a
known nonzero chance of selection for each population element, the
simple random sample is considered a special case in which each
population element has a known and equal chance of selection.
Simple random sampling is often impractical. Reasons include (1) it
requires a population list (sampling frame) that is often not
available; (2) it fails to use all the information about a population,
thus resulting in a design that may be wasteful; and (3) it may be
expensive to implement in both time and money. These problems have led
to the development of alternative designs that are superior to the
simple random design in statistical and/or economic efficiency.
They called:
Complex Random Sampling:
Has three forms:
-Systematic sampling
In this approach, every kth element in the population is sampled,
beginning with a random start of an element in the range of 1 to k.
The kth element is determined by dividing the sample size into the
population size to obtain the skip pattern applied to the sampling
frame:
.Identify the total number of elements in population.
.Identify the sampling ratio (k = total population size divided by
size of the desired sample).
.Identify the random start.
.Draw a sample by choosing every kth entry
The major advantage of systematic sampling is its simplicity and
flexibility. It is easier to instruct field workers to choose the
dwelling unit listed on every kth line of a listing sheet than it is
to use a random numbers table. With systematic sampling, there is no
need to number the entries in a large personnel file before drawing a
sample.
-Stratified sampling
Most populations can be segregated into several mutually exclusive
subpopulations, or strata. The process by which the sample is
constrained to include elements from each of the segments is called
stratified random sampling. University students can be divided by
their class level, school or major, gender, and so forth. After a
population is divided into the appropriate strata, a simple random
sample can be taken within each stratum. The sampling results can then
be weighted and combined into appropriate population estimates.
There are three reasons why a researcher chooses a stratified random
sample: (1) to increase a sample's statistical efficiency, (2) to
provide adequate data for analyzing the various subpopulations, and
(3) to enable different research methods and procedures to be used in
different strata.
Stratification is usually more efficient statistically than simple
random sampling and at worst it is equal to it. With the ideal
stratification, each stratum is homogeneous internally and
heterogeneous with other strata. This might occur in a sample that
includes members of several distinct ethnic groups. In this instance,
stratification makes a pronounced improvement in statistical
efficiency.
-Cluster Sampling
In a simple random sample, each population element is selected
individually. The population can also be divided into groups of
elements with some groups randomly selected for study. This is cluster
sampling. Cluster sampling differs from stratified sampling in several
ways. The idea is to select typical groups that represent the
remaining ones.
When done properly, cluster sampling also provides an unbiased
estimate of population parameters. Two conditions foster the use of
cluster sampling: (1) the need for more economic efficiency than can
be provided by simple random sampling and (2) the frequent
unavailability of a practical sampling frame for individual elements.
Statistical efficiency for cluster samples is usually lower than for
simple random samples chiefly because clusters are usually
homogeneous. Families in the same block (a typical cluster) are often
similar in social class, income level, ethnic origin, and so forth.
While statistical efficiency in most cluster sampling may be low,
economic efficiency is often great enough to overcome this weakness.
The criterion, then, is the net relative efficiency resulting from the
trade-off between economic and statistical factors. It may take 690
interviews with a cluster design to give the same precision as 424
simple random interviews. But if it costs only $5 per interview in the
cluster situation and $10 in the simple random case, the cluster
sample is more attractive ($3,450 versus $4,240).
Summed up from "Sampling Issues - Part II and III" by David M.
Compton, Ph.D.
------------------------------------------------------------------
6- Convenience and purposive sampling.
Nonprobability sampling does not involve random selection. it does
mean that nonprobability samples cannot depend upon the rationale of
probability theory.
With a probabilistic sample, we know the odds or probability that we
have represented the population well; we are able to estimate
confidence intervals for the statistic. With nonprobability samples,
we may or may not represent the population well, and it will often be
hard for us to know how well we've done so.
Probability sampling may be superior in theory, but there are
breakdowns in its application; for example: the ideal probability
sampling may be only partially achieved because of the human element.
-Convenience Sampling:
Nonprobability samples that are unrestricted are called convenience
samples. They are the least reliable design but normally the cheapest
and easiest to conduct. Researchers or field workers have the freedom
to choose whomever they find, thus the name convenience. Examples
include informal pools of friends and neighbors, people responding to
a newspaper's invitation for readers to state their positions on some
public issue or a TV reporter's "man-on-the-street" intercept
interviews, or using employees to evaluate the taste of a new snack
food.
-Purposive Sampling:
In purposive sampling, we sample with a purpose in mind. So a
nonprobability sample that conforms to certain criteria is called
purposive sampling. There are two major types - judgment sampling and
quota sampling.
Judgment sampling occurs when a researcher selects sample members to
conform to some criterion. In a study of labor problems, you may want
to talk only with those who have experienced on-the-job
discrimination.
Quota sampling is used to improve representativeness. The logic behind
quota sampling is that certain relevant characteristics describe the
dimensions of the population. For example if in your job are 20%
african-americans, 15% hispanics, 10% asiatics and 60% caucasians the
quota sampling would call for 20% african-americans, 15% hispanics,
10% asiatics and 60% caucasians in order to prevent race distorsion in
the results.
Summed up from "Sampling Issues - Part IV" by David M. Compton, Ph.D.
In this point I recommend you to read the following text:
"Nonprobability Sampling" by William M.K. Trochim
http://trochim.human.cornell.edu/kb/sampnon.htm
-------------------------------------------------------------------
7. Sample precision and sample accuracy
Accuracy is the degree to which bias is absent from the sample. An
accurate (unbiased) sample is one in which the underestimators and the
overestimators are balanced among the members of the sample. There is
no systematic variance with i.e. there is no variation in measures due
to some known or unknown influences that cause the scores to lean in
one direction more than another.
A second criterion of a good sample design is precision of estimate.
No sample will fully represent its population in all respects. The
numerical descriptors that describe samples may be expected to differ
from those that describe populations because of random fluctuations
inherent in the sampling process. This is called sampling error and
reflects the influences of chance in drawing the sample members.
Precision is measured by the standard error of estimate, a type of
standard deviation measurement; the smaller the standard error of
estimate, the higher is the precision of the sample.
Summed up from "Sampling Issues- Part I" by David M. Compton, Ph.D.
-----------------------------------------------------------------
8. Systematic and error variance
"In nearly every collection of data, there is variability. We are
interested in identifying the sources of that variability. When we are
focusing our attention on a single source, that source is called
systematic variance -- the source of variability that is under
investigation. All other sources of variability are lumped into one
indefinite mass called error variance. Error variance has little to do
with "error" although variability due to errors can be part of the
error variance; it really refers to whatever sources of variability on
which we are not focusing our attention. Because systematic variance
is due to those variables under investigation and error variance
encompasses every other source of variability, the two together equal
the total observed variance:
total variance = systematic variance + error variance"
Taken from the following text:
"Systematic and error variance" from Southern Illinois University
website:
http://www.siu.edu/departments/cola/psycho/faculty/young/ResMethodsStuff/varianceStuff.html
From this text you can take a numeric example of calculation of
systematic and error variance.
The following document from the Department of Psychology of the Brock
University will bring to you more clarification about this subject
(look for the highlighted text):
"3P30 Critical thinking - Chapter 8 - Basic Issues in Experimental
Research"
http://216.239.39.100/search?q=cache:VqoFdFU1SHcC:www.psyc.brocku.ca/course/outlines2001/ch%25208%25203P30%2520web.doc+systematic+error+variance&hl=en&ie=UTF-8=UTF-8
------------------------------------------------------------------
9. Variable and attribute parameters.
Variable: A characteristic of interest about each individual element
of a population or sample. Examples: Age, height, eye color, etc.
A variable is any measured characteristic or attribute that can differ
for different subjects. For example, if the weight of 30 boxes were
measured, then weight would be a variable.
Variables can be quantitative or qualitative. (Qualitative variables
are sometimes called "categorical variables.") Quantitative variables
are measured on an ordinal, interval, or ratio scale; qualitative
variables are measured on a nominal scale. If five-year old subjects
were asked to name their favorite color, then the variable would be
qualitative. If the time it took them to respond were measured, then
the variable would be quantitative.
An attribute parameter is a parameter value for the associated
variable.
Note the distinction between a variable and an attribute parameter. An
attribute parameter's value is the same for all units of the
population, whereas the variable can has different values for each
unit.
---------------------------------------------------------------------
10. Point estimate and interval estimate
"When a parameter is being estimated, the estimate can be either a
single number or it can be a range of scores. When the estimate is a
single number, the estimate is called a "point estimate"; when the
estimate is a range of scores, the estimate is called an interval
estimate. Confidence intervals are used for interval estimates.
As an example of a point estimate, assume you wanted to estimate the
mean time it takes 12- year-olds to run 100 yards. The mean running
time of a random sample of 12-year-olds would be an estimate of the
mean running time for all 12-year-olds. Thus, the sample mean, M,
would be a point estimate of the population mean, m."
From "HyperstatOnline" :
http://davidmlane.com/hyperstat/point_estimation.html
"Interval estimates: Confidence Intervals
The sample mean and sample variance are examples of
“point estimates”: they give a single number which is an
estimate of a population parameter.
It is generally more useful to have an interval estimate,
which gives a range of values which we have some
confidence contains the true (population) value.
This “confidence” is usually expressed as a percentage
probability."
From "Interval estimates: Confidence Intervals", Department of
Statistics and Modelling Science - University of Strathclyde
http://www.stams.strath.ac.uk/~steve/53202/confint1.pdf
"Interval Estimates - Confidence Intervals
Here the objective is to take samples from a population and construct
an interval [l,u] for the unknown value of the parameter being
estimated. Typically, there is a confidence coefficient associated
with the interval, e.g. one may construct a 99% confidence interval or
a 96% confidence interval, or in general, a 100*(1-a)% interval where
a is a small fraction (typically less than or equal to 0.1)
One should be very clear on how to interpret such an interval.
Given a 100*(1-a)% confidence interval [l,u] for the value of some
parameter q, it is NOT true that "there is a probability of (1-a)
that the value of q lies within this interval." In fact, there IS no
such probability since the value either lies within the interval (in
which case the probability is equal to 1) or it does not lie within
the interval (in which case the probability is equal to 0)!
A better way to interpret the interval is that there is a probability
of (1-a) that the interval covers the (unknown) value of q. More
precisely, suppose we were to follow the correct sampling procedure
used to construct the interval and we were to repeat this procedure a
very large number of times, constructing a confidence interval each
time. Note that these intervals will in general be different each time
since they are based upon different random samples that will in
general be different from each other. If we were to follow this
procedure, then we would expect about 100*(1-a)% of the intervals to
contain the true value of q."
From "Interval estimates: Confidence Intervals" Industrial Engineering
University of Pittsburgh.
http://www.pitt.edu/~jrclass/stat/notes/OH33.html
"Example of Point Estimate"
http://www.amstat.org/publications/jse/v5n1/schwarz.supp/pointest.html
---------------------------------------------------------
11. Proportionate and disproportionate sample.
"The size of the strata samples is calculated with two pieces of
information: (1) how large the total sample should be and (2) how the
total sample should be allocated among strata. In deciding how to
allocate a total sample among various strata, there are proportionate
and disproportionate options.
-Proportionate versus Disproportionate Sampling:
In proportionate stratified sampling, each stratum is properly
represented so the sample drawn from it is proportionate to the
stratum's share of the total population. This approach is more popular
than any other stratified sampling procedures. Some reasons for this
include:
.It has higher statistical efficiency than will a simple random
sample.
.It is much easier to carry out than other stratifying methods.
.It provides a self-weighting sample; the population mean or
proportion can be estimated simply by calculating the mean or
proportion of all sample cases, eliminating the weighting of
responses.
On the other hand, proportionate stratified samples often gain little
in statistical efficiency if the strata measures and their variances
are similar for the major variables under study.
Any stratification that departs from the proportionate relationship is
disproportionate. There are several disproportionate allocation
schemes. One type is a judgmentally determined disproportion based on
the idea that each stratum is large enough to secure adequate
confidence levels and interval range estimates for individual strata.
A researcher makes decisions regarding disproportionate sampling,
however, by considering how a sample will be allocated among strata.
One author states, ´In a given stratum, take a larger sample if the
stratum is larger than other strata; the stratum is more variable
internally; and sampling is cheaper in the stratum.´
If one uses these suggestions as a guide, it is possible to develop an
optimal stratification scheme. When there is no difference in
intra-stratum variances and when the costs of sampling among strata
are equal, the optimal design is a proportionate sample.
While disproportionate sampling is theoretically superior, there is
some question as to whether it has wide applicability in a practical
sense. If the differences in sampling costs or variances among strata
are large, then disproportionate sampling is desirable. It has been
suggested that differences of several-fold are required to make
disproportionate sampling worthwhile."
From "Sampling Issues- Part II" by David M. Compton, Ph.D.
-------------------------------------------------------------
Under what kind of conditions would you recommend each of the
following?:
a. A probability sample? A nonprobability sample?
A probability sampling method is any method of sampling that utilizes
some random selection. In order to have a random selection method,
you must set up some process that assures that the different units in
your population have equal probabilities of being chosen. While
probability sampling may be superior in theory than nonprobability
sampling, there are breakdowns in its application. Even carefully
stated random sampling procedures may be subject to careless
application by the people involved. The total population may not be
available for study in certain cases. At the scene of a major event,
it may be infeasible to even attempt to construct a probability
sample. A study of past correspondence between two companies must use
an arbitrary sample because the full correspondence is normally not
available. In another sense, those who are included in a sample may
select themselves. In mail surveys, those who respond may not
represent a true cross section of those who receive the questionnaire.
The receivers of the questionnaire decide for themselves whether they
will participate. There is some of this self-selection in almost all
surveys because every respondent chooses whether or not to be
interviewed.
So, the probability sampling is prefered to be used under
circunstances where "externals factors" have a very little
interference, like scientific research: laboratories´ experiments,
medical research, etc.
We may use nonprobability sampling procedures because they
satisfactorily meet the sampling objectives. While a random sample
will give us a true cross section of the population, this may not be
the objective of the research. If there is no desire or need to
generalize to a population parameter, then there is much less concern
about whether the sample fully reflects the population.
Often researchers have more limited objectives. They may be looking
only for the range of conditions or for examples of dramatic
variations. This is especially true in exploratory research where one
may wish to contact only certain persons or cases that are clearly
atypical.
Additional reasons for choosing nonprobability over probability
sampling are cost and time. Probability sampling clearly calls for
more planning and repeated callbacks to ensure that each selected
sample member is contacted. These activities are expensive. Carefully
controlled nonprobability sampling often seems to give acceptable
results, so the investigator may not even consider probability
sampling.
It is also possible that nonprobability sampling may be the only
feasible alternative. As we said before the total population may not
be available for study in certain cases.
Ideas taken from:
"Sampling Issues- Part III & IV" by David M. Compton, Ph.D.
Trochim, William M. "The Research Methods Knowledge Base, 2nd
Edition." Internet WWW page, at URL:
http://trochim.human.cornell.edu/kb/index.htm
-------------------------------------------------------------------
b. A simple random sample? A cluster sample? A stratified sample?
Simple random sampling is often impractical. Reasons include (1) it
requires a population list (sampling frame) that is often not
available; (2) it fails to use all the information about a population,
thus resulting in a design that may be wasteful; and (3) it may be
expensive to implement in both time and money. So to use simple random
sampling we need a small and uniform population.
This is what we do when we separate the population in stratas. The
stratified sampling is also useful when the researcher wants to study
the characteristics of certain population subgroups.
Stratified sampling techniques are generally used when the population
is heterogeneous, or dissimilar, where certain homogeneous, or
similar, sub-populations can be isolated (strata). Simple random
sampling is most appropriate when the entire population from which the
sample is taken is homogeneous. Some reasons for using stratified
sampling over simple random sampling are :
a) the cost per observation in the survey may be reduced;
b) estimates of the population parameters may be wanted for each
sub-population;
c) increased accuracy at given cost.
Example
Suppose a farmer wishes to work out the average milk yield of each cow
type in his herd which consists of Ayrshire, Friesian, Galloway and
Jersey cows. He could divide up his herd into the four sub-groups and
take samples from these.
Cluster sampling is typically used when the researcher cannot get a
complete list of the members of a population they wish to study but
can get a complete list of groups or 'clusters' of the population. It
is also used when a random sample would produce a list of subjects so
widely scattered that surveying them would prove to be far too
expensive, for example, people who live in different postal districts
in the U.K.
This sampling technique may well be more practical and/or economical
than simple random sampling or stratified sampling.
Example:
Suppose that the Department of Agriculture wishes to investigate the
use of pesticides by farmers in England. A cluster sample could be
taken by identifying the different counties in England as clusters. A
sample of these counties (clusters) would then be chosen at random, so
all farmers in those counties selected would be included in the
sample. It can be seen here then that it is easier to visit several
farmers in the same county than it is to travel to each farm in a
random sample to observe the use of pesticides.
Source: "Statistics Glossary - Sampling".
http://www.cas.lancs.ac.uk/glossary_v1.1/samp.html#randsamp
See also "Probability Sampling" by William M.K. Trochim for more
examples:
http://trochim.human.cornell.edu/kb/sampprob.htm
-------------------------------------------------------------------
c. Using the finite population adjustment factor?
The formula for the standard error of the mean is accurate only when
samples are drawn from a very large or infinite population, or when
they’re drawn with replacement to the population. Since this seldom
happens in real life we have to apply a correction factor. This
allows us to get a more accurate standard error of the mean.
If you are sampling from a relative small finite population, the usual
variance estimate can be too large. If the sample size n is over 5% of
the population size N, you will benefit by using the finite population
correction factor.
You can see the formula here:
http://www.lakeland.cc.oh.us/academic/sh/math/ddavis/telecourse/ppfiles/BMAN07ef/sld066.htm
Also read this (it has a nice example:
"Sampling Variability and Sampling Distributions"
http://home.xnet.com/~fidler/triton/math/review/mat170/sdist/sdist1.htm
--------------------------------------------------------------------
d. A disproportionate stratified probability sample?
"Disproportionate stratified sample: Appropriate method when certain
segments of a population are seen as more important than others, as
varying more, or as more expensive to sample. For example, a health
insurance company surveys its corporate customers, oversampling firms
with larger memberships to reflect their true influence."
From: "Griggs-Anderson Research: Glossary - S":
http://www.gar.com/primer/glosss.htm
"disproportionate stratified sampling:
A probability sampling method in which the sample size of a particular
stratum is not proportional to the population strata. For example, a
stratum could be large supermarkets, which may only account for 20% of
all grocery stores - although they account for 80% of grocery sales.
In this case, a disproportionate sample would be used to represent the
large supermarkets to reflect their sales (i.e. 80%) rather than the
number of stores."
From "xRefer - Dictionary of Business: Oxford University Press":
http://www.xrefer.com/entry/163088
--------------------------------------------------------------------------
wow!! what a question!!
I hope this helps you.
The search strategy was to look for each subject in Google to complete
some concepts.
The greater part of the answer can be found in the folowing pages that
I used. I have they in my favorites folder(for help in my University
studies):
- "Sampling Issues - Part I,II,III,IV" by David M. Compton, Ph.D.
(Note: At the bottom of this page -Part IV- there are links to part
I,II & III)
http://faculty.pba.edu/comptond/callisto/courses/Quantitative_Methods/Sampling-4.html
- "Research Methods Knowledge Base" by William M. Trochim, Cornell
University
http://trochim.human.cornell.edu/kb/
Note: in this case follow the Sampling link, and then navigate the
page following the next links:
http://trochim.human.cornell.edu/kb/sampling.htm
If you need clarifications please post a request for it.
Best Regards
livioflores-ga |