View Question
Q: Statistical Evaluation of Questionnaire Results ( No Answer,   3 Comments )
 Question
 Subject: Statistical Evaluation of Questionnaire Results Category: Science > Social Sciences Asked by: gallaxian-ga List Price: \$50.00 Posted: 10 May 2006 10:56 PDT Expires: 09 Jun 2006 10:56 PDT Question ID: 727333
 ```I am a member of an elementary school board of directors. Each year this board surveys parent satisfaction with various aspects of our school by circulating a questionnaire. This past year we had a response rate of approximately 54% of families / households - each family completes one survey regardles of the number of students enrolled. Since participation in the survey is voluntary I am concerned that our findings are invalid due to what is known in statistics as "selection bias". I have four related questions - ideally to be answered by a professional pollster or statistician: 1) What percentage of participation is necessary to achieve a statistically-valid result when participation is voluntary? In other words, what is the threshhold of participation (in percentage terms) over which we can be confident that our results haven't been too skewed by selection bias? 2) Is it possible to get a statistically-valid result using randomized sampling in a population as small as 270 families/households? AND if so.. 3) How do we determine the proportion of families/households (out of approx. 270) that we must survey to do so. AND FINALLY... 4) I know that political pollsters often weight their samples to ensure that certain constituencies / socio-political demographic groups are appropriately represented. Would it be possible and/or desireable to weight our school's sample to ensure that families from each grade level (K-8th grade) were represented in the sample? I'm concerned that the sample size from each class would be too small to provide statistically meaningful results at the grade level.```
 There is no answer at this time.

 ```Hi I am an Applied Mathematics and Statistics undergraduate. So my response may not be ideal. There was no answer nor other comments, so I decided to leave some of my comment as a theoretical statistician. 1. In Statistic Thoery, it is often observed that sample size larger than 29 estimates the population distribution quite well, IFF the population distribution is believed to be normally distributed. Since 54% of family responds to the survey, I think the estimation border is okay. However, selection bias is always the problem when the questionnair is voluntary. If you are concerned about selection bias, it would be wise to assess the questionnair randomly, experimentally to see the sigfinicance of difference in distribution. Also to avoid response bias, I suggest the questionnair be not too formal as if to remind the family that it is questionnair... 2. So 270 households are the population, correct? The population size is okay. 3. 30 is the minimum according to the normalized statistics, thus about 10% is the borderline. But as you realize larger sample is better approximation. Just be cautious about taking too large a sample. According to the Regression Theory, larger sample does not always yield the best result. So I guess, 20-30% of the random sample is decent. 4. Taking survey for every grade is a great strategy for this, actually. But considering the size of population, you might want to consider taking bi-K - pool K-1 and K-2 together and pick respondent randomly, respectively K-3 and K-4, K-5 and K-6, K-7 and K-8. I personally do not prefer to categorize children like that because I never managed the school classes... Anyway, if the survey is intended to investigate the children's development in school, you should definitely consider breaking down grades. But if it is for understanding overall opinion, you might just use random sample of break down at K-1 through K-4 and/or K-5 through K-8? I hope my comments give some hope...```
 ```Hi, I work for a polling company and have a background in statistics. The vast majority of the work we do is not concerned with political polling but undertaking social research (including surveys of parents!) Anyway, some comments that might help. 1) Selection bias (also known as non-response bias) as you rightly point out, has the potential to be a major issue here. The danger is, that the people who do not respond are systematically different from those that do respond. The higher the response rate, the less likely that there will be selection bias. In large scale national studies, a response rate of less than 60% is considered poor and a response rate of more than 70% is considered good. These tend to be completed using face-to-face in-home interviewing. Postal surveys or internet surveys tend to have lower response rates. However, while percentage of participation (normally refered to as the response rate) is related to selection bias, this is not the whole story. The more important question is, are those that respond likely to be different to those that don't? Let me give a couple of examples of bad surveys. Another company used a postal survey methodology to ask teachers about levels of workload. They found a suprisingly low average number of hours worked. What was happening was that the overworked teachers were those who did not take the time to fill in their (overly long) questionnaire. This obviously invalidated the results. Second example. The national crime survey here switched from a face-to-face survey to a telephone survey. To most people's suprise, the estimates of victimisation (the proportion of people experiencing different types of crimes) went up considerably. This could not be ascribed to differences in the way that the sampling was undertaken. The most likely reason was this. It is easier to refuse to undertake research over the phone than it is to refuse when an interviewer comes to your door. People who had been victims of crime were much more likely to want to take part in the survey than those who hadn't experienced any crime. So, less non-victims refused to take part in the face-to-face survey than in the telephone survey, leading to higher victimisation rates in the telephone survey and a prompt return to a face to face methodology! In other words, there is no magic threshold to reduce non-response bias to an acceptable level. A survey with a response rate of 50%, could, theoretically, have NO non-response bias if there is no RELEVANT systematic difference between those who respond and those who don't respond. You have to ask yourself these questions - are there systematic reasons why some parents would respond and some wouldn't, and are these likely to be related to your findings. For example, are working parents less likely to respond than non-working parents as they have less free time. And if so, are their levels of satisfaction likely to be different ? Or are their views on, say, after school clubs different (probably)? Are dissatisfied parents less likely to respond because they don't trust the school. Or, alternatively, are dissatisfied parents more likely to respond, wanting to air their complaints more than those parents who are contented, and don't really see what the need is to provide feedback. Are single parents less likely to respond? How would their views be different? How are the questinnaires delivered? If the questionnaires were delivered through a school-bag drop, are less questionnaires received back from parents of younger children (who tend to be less good at passing on paperwork to parents). There is no easy answer to non-response bias. While you could see if the characteristics of those who were responding were different from those who did not respond - however not if the survey is completed anonymously as it should be - you would still have the question, what effect will this have on our estimates? Our common approach is this. Within a budget, make sure that you maximise response as much as possible. Then always critically evaluate your results, thinking were bias might occur and being aware of the possible limitations of your results. 2) Yes! This question relates to precision rather than accuracy. This is an important distinction, and you are right to ask it after the first question! The most common measure of sampling precision are confidence intervals. These are determined by three factors: size of the population; size of the sample; and the percentage of the estimate. Getting the whole population (270 out of 270) would mean your estimates are exact. (NB - though remember how you word questions will have an impact on your results). The sample of 200 from a population of 270 would mean that a result of 50% would be accurate +/- 3.5%, 19 times out of 20. For a sample of 150, confidence intervals for a result of 50% would be +/- 5.3%. A sample of 100, would give accuracy to +/- 7.8%, 19 times out of 20. (Confidence intervals are widest for estimates of 50%, reducing more, the nearer to 0% or 100% you get). Confidence intervals are routinely used on random samples. Please note, however, they do not take account of selection bias, so there is a common danger of ascribing too much precision to survey results. 3) That is up to you! Decisions on sample size are always a trade-off between cost and precision. Opinion polls of the national population are commonly based on a sample of 1,000 (and confidence intervals of +/- 3%). Accuracy is normally considered less important for measures of attitudes (eg. satisfaction) than of prevalence (eg crime rates), as attitudes, by their very nature, are not exact. Beware of the danger of false precision! Two other things to consider. First, the confidence intervals given above relate to your whole population of 270. If you want to conduct analysis of sub-groups the confidence intervals will be broader. (E.g. If population = 200, sample = 100, estimate of 50% +/- 6.9% BUT Population =100, sample = 50, estimate of 50% +/- 9.8%). Second, it is possible you have two choices: large sample/lower response rate OR smaller sample/higher response rate. In this instance, I would always go with the second. Selection bias has far more potential to invalidate your results than lower precision due to small sample sizes. So it may be more efficient to spend money on sending reminder letters, phoning up non-responding parents etc, of a sub-sample than getting a lower response rate from a 100% sample of the population (a census approach). 4) Weighting might (or might not) make your sample more accurate, but it won't make it more precise. Lets assume there is an equal number of parents with children in K-8. but that your achieved sample is low on the parents of kids at the upper end (6,7,8 say). Weighting the results by grade is likely to make your sample more representative, but will it make your results more accurate. If satisfaction is higher or lower for parents with kids in 6,7,8 than the parents with kids in K1,2,3, then weighting will make your estimates for satisfaction for ALL parents more accurate. If there is no differenc in the results by grade of kids, weighting by grade will have no effect. BUT, again the danger of selection bias raises its head. If your non-responders are different from your responders, then weighting will not correct for this (ie. if responding parents of kids in Year 8 are satisfied, but non-responding parents of kids in Year 8 are dissatisfied, weighting would make no difference). Weighting will not have any positive affect whatsoever on the precision of your results. Indeed, without going into statistical details, it is likely to lead to a reduction in your precision (ie. increase your confidence intervals). If you do decide to use a sample (and or weighting strategy) this should be based on what is driving levels of satisfaction. As you already have some data, I would look at this data to determine whether the sampling strategy should be based on age of child, area, class, etc. etc. Good luck! Hope this helps.```
 ```Other commentors have started with their background, so here's mine: I'm a university professor in the social sciences, and I teach graduate courses in research methods. If you'd like to calculate the confidence intervals of your survey (the +/- band of your results), a handy site is: http://www.surveysystem.com/sscalc.htm You'll find that it's not enough to describe what % of your sample population you surveyed. I know, it intuitively seems like it should work that way, but it doesn't. Take these two examples, both involving a 50% sample of a population: Surveying 50 people in a 100 person population yields a confidence interval of +/- 9.85. However, surveying 500 people in a 1000 population yields a much better confidence interval of +/- 3.1. (Both of these with a confidence level of 95%). The general punchline: if you have a very small population, you're going to have to survey a very large percentage of them to get reliable results And if you want a good estimation of how reliable your results are, you need to calculate the interval based on the actual # in your population and the actual # of returned results--not a % of respondents. Finally, as has already pointed out, that confidence interval will be based on a random sample, which you won't get with the self-selection that goes on with the mail-out survey you described. You should probably expect and account for the fact that your data will likely be skewed/biased toward extreme views (on either side of an issue), since those with strong feelings will be more likely to go through the bother of replying. Hope this helps! And by the way, kudos to you for going through the effort to get accurate and useful feedback from the community you're trying to serve. - John```