Google Answers: Statistics

View Question

Q: Statistics ( Answered 5 out of 5 stars

Question

Subject: Statistics
Category: Science > Math
Asked by: alphapumpkin-ga
List Price: $25.00

Posted: 23 Aug 2006 21:22 PDT
Expires: 22 Sep 2006 21:22 PDT
Question ID: 758979

Okay, I need a few things!

First, I would love to find a few great online, interactive tutorials
for learning statistics. The catch?  They need to be free!   I really
need a few tutorials that I can use to gain familiarity with
statistics in a matter of a few weeks.

Second, I need to know that statistical method for proving that there
either is or is not a correlation between two numbers on a different
scale.  Here's what I am trying to do:

I'm trying to find out if there is a correlation between the amount of
refunds that are given to customer versus how happy they are after the
interaction as measured by a survey.  I would like to be able to do
this for a period of 30 days.

For data, I have the total number of refunds for the month, average
refund amount, daily refund amounts etc.  I can manipulate the refund
data just about any way over a 30 day period.  I am not lacking in
data!!!  I can manipulate my databse to produce just about any
meaningful number in terms of the refunds.

As far as the surveys, there are ten yes or no questions that are
asked.  Each no is a negative response while each yes is a positive
response.  So, if a person answers yes to all ten questions, it is
counted as 100%.  If they only answer yes to five questions, the
result would be 50%.

My data for average refund amount for the month tends to range from
$10 to $20 with most being in the area of $15.00.   The survey scores
are expressed in terms of a percentage and are typically between 80%
and 95%.

Since the scales are so different, I'm not sure how to graph or
manipulate these numbers to determine correlation.

Ultimately, I would like to find out if there is a relationship that
can either be statisticall proven or disproven.  What statistical
method should be used and are there any suggestions as to how I should
manipulate the refund data in order to get the best result?

Answer

Subject: Re: Statistics
Answered By: elmarto-ga on 24 Aug 2006 07:21 PDT
Rated: 5 out of 5 stars

Hello!
As the commenter herkdrvr-ga mentioned, the CORREL function could be
used in order to determine whether there is a correlation between
those two variables, but I think a REGRESSION analysis would give you
more information, and is a bit easier to interpret.

Before this analysis, if you want to get an idea of the relationship
between your variables with a graph, you should use a Scatter Plot
(available in Excel). If the dots appear to form an "increasing" or
"decreasing" pattern, then the variables are probably correlated. On
the other hand, if you just see a "cloud" of dots, it's possible that
there is no correlation between your variables.

The more formal way to the analysis is through a regression. In order
to run this analysis in Excel, you must install the Data Analysis
Toolpack plugin. This can be done from the menu Tools - Plugins.

The basic regression you could run here would be the customer
satisfaction explained by the refund amount to a client. This analysis
will yield an equation with which you might be able to "predict" the
Satisfaction level based on the Refund Amount.

Let's see how to do it. In Excel, choose Tools - Data Analysis -
Regression. In the field "Input Y Range" you need to select the cells
which contain the variable you are looking to explain (the one which
you assume that depends on the other ones). In this case, your "Y"
would be the Satisfaction Level. In the field "Input X Range", you
have to enter the cells which contain the variable that explain Y.
Clearly, since you're looking to explain the Satisfaction level with
the Refunds Amount, you should enter the "Refunds Amount" cells in
this field.

If your selection included also the names of the variables , then
check "Labels". Finally, run the regression.

The output will contain several figures in which you are not
interested. The ones you should pay attention to are:

R Square - this will tell you the portion of "Satisfaction" that is
explained "Refunds Amount". A value of 0.7, for example, would
indicate that approxiamtely 70% of the variations in the Satisfaction
Level are explained by variations in the Refund Amount.

Coefficients (in the thrid table) - With these you will be able to
form an equation that relates Refunds to Satisfaction. The equation
will have the form:

Satisfaction = (Intercept Coefficient) + (Amounts Coefficient)*(Refunds Amount)

So for example, if you got that the Intercept coefficient is 30 and
the refunds amount coefficient is 2, the relevant equation should be:

Satisfaction % = 30 + 2*(Refunds Amount)

So if you refund $15 to a person, you should expect the satisfaction
level to be 30 + 2*15 = 30 + 30 = 60%. The Amounts Coefficient is also
important because it tells you how your two variables are related. In
my example (coefficient = 2), it implies that each extra $1 you refund
to a customer increases his satisfaction by 2%.

The final figure which you should look is the "P-Value" (also in the
third table) for the Amounts variable. In order for the relationship
you found to be statistically significant, this value should be lower
than 0.05. In this case, you could say that you are 95% confident that
there exists a relationship between your two variables. Otherwise,
it's possible that the Amounts Coefficient is actually zero; that is,
that there is no relationship at all between Refund Amounts and
Satisfaction.

The beauty of the regression analysis is that it allows for more
"explanatory" variables. This is something that can't be done with the
scatter plot or a simple correlation analysis. For example, if you
think Satisfaction level can be explain by both the Refunds Amount and
some other variable (for which you have data, of course), you can
include both of them in the "Input X Range" field, and you will get an
equation that relates both of them to the Satisfaction level. In this
case, you should check the P-Value for each of them in order to decide
which one is statistically significant.

As an example, if you have the data, you could use "Total Refunds for
the day" besides the specific Refund Amount for each client as an
explanatory variable. Perhaps, if there were too many refunds in a
single day, the person in charge of the refunds could be in a bad
mood, which could have a negative effect on Satisfaction.

Finally, don't worry about your variables being in different scales.
The regression analysis already adjusts for those differences. For
instance, as I mentioned earlier, the "Amounts" coefficient is in
terms of "Satisfaction Percentage per Dollar Refunded". If you change
the scales, the value of the coefficient will automatically change to
reflact it.

A nice tutorial on how to use Excel's regression analysis can be found at
http://www.cba.nau.edu/allen-d/Excel%20Regression%20Tutorial/excel_regression_tutorial.htm

Google search terms
regression tutorial
://www.google.com.ar/search?sourceid=navclient&ie=UTF-8&rls=GGLJ,GGLJ:2006-23,GGLJ:en&q=regression+tutorial

I hope this helps! If you have any doubt regarding my answer, please
don't hesitate to request clarification before rating it. Otherwise, I
await your rating and final comments.

Best wishes!
elmarto

alphapumpkin-ga rated this answer: 5 out of 5 stars

Comments

Subject: Re: Statistics
From: herkdrvr-ga on 24 Aug 2006 04:46 PDT

I don't know a good online statistics site, but here's an easy way,
without using all the formulas.

You already have all the data, so use the CORREL function within
Microsoft Excel to get the correlation coeffecient.

For instance, you could have Refund Amount in Column A, and Survey
Result expressed as a decimal percentage in Column B.  You can use
Excel help if you don't know how to use functions within Excel.

Remember, -1 would mean they are negatively correlated, 0 is no
correlation and 1 is positively correlated.  Also, correlation has
NOTHING to do with causation, so don't fall into that trap while you
are examining your results.  Example, you want to give every child a
computer because it has been shown that there is a positive
correlation between computer ownership and test scores.  Computers and
test scores are correlated.  Perhaps higher income families stress
education more and also have the money to buy a computer.  You can see
that although they are correlated, one variable may not CAUSE the
next.

Finally, most statisticians recommend against using survey scales
because, for instance, 2 is between 1 and 3, but in a survey, 2 for
one person may be different than another person.  What's the
difference between satisfied and very satisfied?  Who knows.

Good luck!

Herkdrvr

Subject: Re: Statistics
From: alphapumpkin-ga on 24 Aug 2006 08:16 PDT

The links are wonderful, the overview was great and now I'm off to a
good start!  It's been years since I've seen a statistics book and
this really helped!!!  I have an idea now of the specific conepts in
stats that I need to focus on for this type of calculation.  Thanks!

Subject: Re: Statistics
From: dcjohn-ga on 11 Sep 2006 22:32 PDT

I'm surprised no one brought this up, but FYI you'll want to give some
serious consideration to what you use as a scale for "happiness" or
"satisfaction" (your dependent variable in this case).

It's technically an ordinal variable, and I believe folks are
suggesting you treat it as an interval variable.  (You can read more
about types of measurement here:
http://en.wikipedia.org/wiki/Categorical_variable)  You'll note, your
refund amount--assuming you're just tracking the $ returned--is an
interval variable.

Now, that's not an unknown practice.  Folks do it with Likert scales
fairly often (e.g.: 1=Very Unsatisfied, 2=Somewhat Unsatisfied,
3=Neutral, 4=Somewhat Satisfied, 5=Very Satisfied), but it's actually
questionable and can lead to some problems.  Without going into the
math of it, let me suggest that you come up with a 7 or (better yet) 9
point scale of satisfaction instead of a 5 point scale.  That way
you'll get in less potential problems if you treat that satisfaction
measurement as an interval variable.

Think of is this way: it's easy to do, and without changing the math
you do, it'll increase the reliability and validity of your results.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy