View Question
 Question
 Subject: Statistics Category: Science > Math Asked by: alphapumpkin-ga List Price: \$25.00 Posted: 23 Aug 2006 21:22 PDT Expires: 22 Sep 2006 21:22 PDT Question ID: 758979
 ```Okay, I need a few things! First, I would love to find a few great online, interactive tutorials for learning statistics. The catch? They need to be free! I really need a few tutorials that I can use to gain familiarity with statistics in a matter of a few weeks. Second, I need to know that statistical method for proving that there either is or is not a correlation between two numbers on a different scale. Here's what I am trying to do: I'm trying to find out if there is a correlation between the amount of refunds that are given to customer versus how happy they are after the interaction as measured by a survey. I would like to be able to do this for a period of 30 days. For data, I have the total number of refunds for the month, average refund amount, daily refund amounts etc. I can manipulate the refund data just about any way over a 30 day period. I am not lacking in data!!! I can manipulate my databse to produce just about any meaningful number in terms of the refunds. As far as the surveys, there are ten yes or no questions that are asked. Each no is a negative response while each yes is a positive response. So, if a person answers yes to all ten questions, it is counted as 100%. If they only answer yes to five questions, the result would be 50%. My data for average refund amount for the month tends to range from \$10 to \$20 with most being in the area of \$15.00. The survey scores are expressed in terms of a percentage and are typically between 80% and 95%. Since the scales are so different, I'm not sure how to graph or manipulate these numbers to determine correlation. Ultimately, I would like to find out if there is a relationship that can either be statisticall proven or disproven. What statistical method should be used and are there any suggestions as to how I should manipulate the refund data in order to get the best result?```
 Subject: Re: Statistics Answered By: elmarto-ga on 24 Aug 2006 07:21 PDT Rated:
 ```Hello! As the commenter herkdrvr-ga mentioned, the CORREL function could be used in order to determine whether there is a correlation between those two variables, but I think a REGRESSION analysis would give you more information, and is a bit easier to interpret. Before this analysis, if you want to get an idea of the relationship between your variables with a graph, you should use a Scatter Plot (available in Excel). If the dots appear to form an "increasing" or "decreasing" pattern, then the variables are probably correlated. On the other hand, if you just see a "cloud" of dots, it's possible that there is no correlation between your variables. The more formal way to the analysis is through a regression. In order to run this analysis in Excel, you must install the Data Analysis Toolpack plugin. This can be done from the menu Tools - Plugins. The basic regression you could run here would be the customer satisfaction explained by the refund amount to a client. This analysis will yield an equation with which you might be able to "predict" the Satisfaction level based on the Refund Amount. Let's see how to do it. In Excel, choose Tools - Data Analysis - Regression. In the field "Input Y Range" you need to select the cells which contain the variable you are looking to explain (the one which you assume that depends on the other ones). In this case, your "Y" would be the Satisfaction Level. In the field "Input X Range", you have to enter the cells which contain the variable that explain Y. Clearly, since you're looking to explain the Satisfaction level with the Refunds Amount, you should enter the "Refunds Amount" cells in this field. If your selection included also the names of the variables , then check "Labels". Finally, run the regression. The output will contain several figures in which you are not interested. The ones you should pay attention to are: R Square - this will tell you the portion of "Satisfaction" that is explained "Refunds Amount". A value of 0.7, for example, would indicate that approxiamtely 70% of the variations in the Satisfaction Level are explained by variations in the Refund Amount. Coefficients (in the thrid table) - With these you will be able to form an equation that relates Refunds to Satisfaction. The equation will have the form: Satisfaction = (Intercept Coefficient) + (Amounts Coefficient)*(Refunds Amount) So for example, if you got that the Intercept coefficient is 30 and the refunds amount coefficient is 2, the relevant equation should be: Satisfaction % = 30 + 2*(Refunds Amount) So if you refund \$15 to a person, you should expect the satisfaction level to be 30 + 2*15 = 30 + 30 = 60%. The Amounts Coefficient is also important because it tells you how your two variables are related. In my example (coefficient = 2), it implies that each extra \$1 you refund to a customer increases his satisfaction by 2%. The final figure which you should look is the "P-Value" (also in the third table) for the Amounts variable. In order for the relationship you found to be statistically significant, this value should be lower than 0.05. In this case, you could say that you are 95% confident that there exists a relationship between your two variables. Otherwise, it's possible that the Amounts Coefficient is actually zero; that is, that there is no relationship at all between Refund Amounts and Satisfaction. The beauty of the regression analysis is that it allows for more "explanatory" variables. This is something that can't be done with the scatter plot or a simple correlation analysis. For example, if you think Satisfaction level can be explain by both the Refunds Amount and some other variable (for which you have data, of course), you can include both of them in the "Input X Range" field, and you will get an equation that relates both of them to the Satisfaction level. In this case, you should check the P-Value for each of them in order to decide which one is statistically significant. As an example, if you have the data, you could use "Total Refunds for the day" besides the specific Refund Amount for each client as an explanatory variable. Perhaps, if there were too many refunds in a single day, the person in charge of the refunds could be in a bad mood, which could have a negative effect on Satisfaction. Finally, don't worry about your variables being in different scales. The regression analysis already adjusts for those differences. For instance, as I mentioned earlier, the "Amounts" coefficient is in terms of "Satisfaction Percentage per Dollar Refunded". If you change the scales, the value of the coefficient will automatically change to reflact it. A nice tutorial on how to use Excel's regression analysis can be found at http://www.cba.nau.edu/allen-d/Excel%20Regression%20Tutorial/excel_regression_tutorial.htm Google search terms regression tutorial ://www.google.com.ar/search?sourceid=navclient&ie=UTF-8&rls=GGLJ,GGLJ:2006-23,GGLJ:en&q=regression+tutorial I hope this helps! If you have any doubt regarding my answer, please don't hesitate to request clarification before rating it. Otherwise, I await your rating and final comments. Best wishes! elmarto```
 ```I don't know a good online statistics site, but here's an easy way, without using all the formulas. You already have all the data, so use the CORREL function within Microsoft Excel to get the correlation coeffecient. For instance, you could have Refund Amount in Column A, and Survey Result expressed as a decimal percentage in Column B. You can use Excel help if you don't know how to use functions within Excel. Remember, -1 would mean they are negatively correlated, 0 is no correlation and 1 is positively correlated. Also, correlation has NOTHING to do with causation, so don't fall into that trap while you are examining your results. Example, you want to give every child a computer because it has been shown that there is a positive correlation between computer ownership and test scores. Computers and test scores are correlated. Perhaps higher income families stress education more and also have the money to buy a computer. You can see that although they are correlated, one variable may not CAUSE the next. Finally, most statisticians recommend against using survey scales because, for instance, 2 is between 1 and 3, but in a survey, 2 for one person may be different than another person. What's the difference between satisfied and very satisfied? Who knows. Good luck! Herkdrvr```
 ```The links are wonderful, the overview was great and now I'm off to a good start! It's been years since I've seen a statistics book and this really helped!!! I have an idea now of the specific conepts in stats that I need to focus on for this type of calculation. Thanks!```
 ```I'm surprised no one brought this up, but FYI you'll want to give some serious consideration to what you use as a scale for "happiness" or "satisfaction" (your dependent variable in this case). It's technically an ordinal variable, and I believe folks are suggesting you treat it as an interval variable. (You can read more about types of measurement here: http://en.wikipedia.org/wiki/Categorical_variable) You'll note, your refund amount--assuming you're just tracking the \$ returned--is an interval variable. Now, that's not an unknown practice. Folks do it with Likert scales fairly often (e.g.: 1=Very Unsatisfied, 2=Somewhat Unsatisfied, 3=Neutral, 4=Somewhat Satisfied, 5=Very Satisfied), but it's actually questionable and can lead to some problems. Without going into the math of it, let me suggest that you come up with a 7 or (better yet) 9 point scale of satisfaction instead of a 5 point scale. That way you'll get in less potential problems if you treat that satisfaction measurement as an interval variable. Think of is this way: it's easy to do, and without changing the math you do, it'll increase the reliability and validity of your results.```