Hello there,
I like statistics - and paired t-tests honestly aren't hard to do. Let
me explain how to do them, using your question as an example...
OK then, so we've got ten different software packages, and the price
that two different companies charge for each of these. We want to find
out whether, on the basis of this sample, one company tends to charge
more for computer software than another company. A paired t-test
sounds sensible then - as each company is quoting for the same set of
software.
(a) OK then, we need to work out the difference between the prices
that the two companies charge for each item of software. I'll work out
the difference in terms of the Computability price minus the PC
Connection price.
SOFTWARE PACKAGES C ($) PC ($) DIFF ($)
Windows 95 Upgrade 88 95 -7
Norton Anti-Virus 59 70 -11
McAfee ViruScan 49 60 -11
First Aid 97 Deluxe 54 58 -4
Clean Sweep III 37 37 0
Norton Utilities 68 75 -7
Netscape Navigator 45 40 5
MS Office Pro 97 Upgrade 300 310 -10
First Aid 97 32 35 -3
Win Fax Pro 95 95 0
Well that bit was easy... Next we're asked to just *look* at the
differences, and say whether we think that one company charges more
than the other. Well yes, look at all those minus signs - often with
quite high absolute values as well! In fact, there's only one item of
software (Netscape Navigator) for which PC Connection's price is lower
than that of Computability. On two items there's no difference in
price, but for all the remaining seven PC Connection are more
expensive. They certainly *seem* to consistently charge more, but
whether we can say that this is likely to be the case for *all* the
software they stock is a matter for a statistical test...
(b) OK, so we next need to find the average difference. That's easy -
we just need to add up all the differences and divide by the total
number of software packages (10). The mean difference is therefore
-$4.80. We also need the standard deviation. Now, we've got to be
really careful here, as what we're doing is trying to estimate the
standard deviation of the price difference for *all* the software that
these two companies stock. So on a calculator you have to use the
(n-1) formula to get the right answer, which is $5.37 (to 3
significant figures - 3SF).
(It's outside of the scope of this answer to go into *why* this (n-1)
formula is used, but a Google search on "sample standard deviation"
gave a number of good references (which include full formulae),
including:
http://www.quickmba.com/stats/standard-deviation/ )
(c) OK, next we need to define our hypotheses. First we ought to
consider whether we're wanting a one-tailed test or a two-tailed test.
Well let's think about this for a moment. A one-tailed test would be
helpful if we were only interested in the differences being in a
particular *direction* - for example, if we were asked "Is PC
Connection more expensive than Computability?". In this case though,
we're asked to test whether the mean difference in price between the
companies is zero - i.e. whether the two companies' prices differ.
Because we don't mind *who* charges more, just if *someone* does, we
need a two-tailed test.
OK. So our null hypothesis, H0 = "That the mean difference in software
price between Computability and PC Connection is zero - i.e. that they
do not differ in price."
And our experimental hypothesis, H1 = "That the mean difference in
software price between Computability and PC Connection is not zero -
i.e. that they differ in price."
(d) Last bit - we're almost done! OK, normally for a t-test we'd have
too make sure that the diifferences were normally distributed, but
we're told that we can assume that bit. So now we need to calculate
our value of 't'. Well t = (the mean of the differences) / (the
standard error of the mean). "What's the standard error of the mean?"
I hear you ask. Well that's the standard deviation that we've already
worked out, divided by the square root of the number of items. It
actually represents the standard deviation that we'd get if we took
lots and lots of batches of 10 software samples, and kept working out
the average price difference for each batch. The standard error of the
mean would be the standard deviation of *all* of these averages
considered together.
For a symbolic version of the formula for calculating t, take a look
at:
http://www.dianthus.co.uk/statistics/student.htm#Paired
Anyway, we get a t-value of -2.83 (3SF). Last of all we need to know
whether this t-value lies outside of our critical range for the 0.05
level of significance. In order to do this, we also need to know the
number of degrees of freedom in your data, which is equal to (n-1),
where n = number of data pairs = 10 - 1 = 9.
Consulting a table of significances, I see that the critical range for
a 2-tailed t-test at the 0.05 level of significance lies between
-2.262 and 2.262. These tables are also available online at:
http://www.psychstat.smsu.edu/introbook/tdist.htm
Our value of t lies outside of this critical range - i.e. our result
is likely happen on *less* than 5% of trials purely by chance. We can
therefore conclude that, to the 0.05 level of significance, using a
2-tailed paired t-test, that there *is* a mean difference in the price
of software for the two companies. Cool eh?
If you want to carry out further tests, there are a few online t-test
calculators out there, like this one at:
http://www.physics.csbsju.edu/stats/Paired_t-test_NROW_form.html
I hope my answer helped you, and anyone else reading this, to gain a
more thorough knowledge of what t-tests are and how to carry them out.
If you want to read up more though, most basic statistics books go
into this sort of thing very well.
Best wishes,
stuartwoozle-ga |