Hi, chanchai-ga:
This type of problem is treated by a branch of mathematical methods
called "statistical inference". Given the data that you present, we
are asked to evaluate whether it suggests that there is "a
relationship between a car having defective brakes and whether it was
purchased from a dealer or a private owner".
What is often done in a situation like this is to apply the
(Pearson's) chi-square test. The idea is to consider the chance that
_if_ there were essentially no distinction between cars purchased from
a dealer and cars purchased from a private owner, we'd get results
which depart from a mutual "average" incidence of defective brakes at
least as much as the observed results do.
Before we dive into the relatively mindless details of computing this
chance, it's important to realize that this probability is strictly
speaking _not_ the same as the chance that there is no "relationship"
between the type of sale and the likelihood of defective brakes. More
formally we would draw up the definition of conditional probabilities
and throw out Bayes formula as a way of rigorously bridging the gulf
between "probability of getting results like this, given no difference
in populations" and "probability of no difference in populations,
given that we got results like this".
Let's just leave it at that though, and proceed to do the chi-square
test. Here we have the simplest of cases, the "two outcome"
situation. Either a car has defective brakes or not. Assuming that
the two groups, dealer sales and private sales, are actually samples
of a common population ("the null hypothesis" meaning no real
distinction between them), we would expect samples of different sizes
to randomly depart from the perfect average according to binomial
model. When the samples are as large as they are here, it's a quite
practical simplification to use a continuous model, the normal
distribution, for the sample averages instead.
The chi-square test is most easily carried out by hand if we total
both the rows and columns in the 2x2 table that you've already
provided:
Defect. Not Defect. Totals by Group
Dealer 931 2723 3654
Private 1690 3498 5188
Totals by Outcome 2621 6221 8842
where the lower right hand corner, the "grand total", is the sum
either of the group totals or the outcome totals.
We next use these totals to find an "expected" value for each of the
original four entries, based on the "null hypothesis". That is, if
the two samples (groups) are drawn from a common population, then by
merging them together we get the best estimate available of their
common fraction of defective and nondefective brakes.
Here we see that of the grand total of 8842 cars sold, we have
altogther these fractions of the two outcomes, rounded appropriately:
cars with defective brakes: 2621/8842 = 0.2964
cars with nondefective brakes: 6221/8842 = 0.7036
Now apply these two outome fractions to the respective group's sample
sizes. In the dealer category we have 3654 cars and in private owner
sales, 5188. Multiply each sample size by the fractions above, and we
get the estimated "expected" value for all four entries.
Expected Expected
Defect. Not Defect. Totals by Group
Dealer 1083 2571 3654
Private 1538 3650 5188
Totals by Outcome 2621 6221 8842
Here I've done a bit of rounding to keep the numbers "nice". The
actual proportions would of course produce decimal fractions, but the
large numbers of observations make it practical for our purposes to
round to whole integers.
Next we find the four differences between observed values O and
expected values E in each category, O - E. For example, in the upper
left hand corner, we observed 931 cars sold by dealers with defective
brakes, but (assuming an average 0.2964 fraction of all cars have
these) we expected 1083. The observed value is less than the
expected, so O - E in this entry is:
O - E = 931 - 1083 = -152
The undercount here must be exactly offset by an overcount in the
complementary category of observed minus expected cars sold by dealers
without defective brakes, and also in private owner sales of cars that
turned out to _have_ defective brakes. And each of those overcounts
must be offset by an undercount in the observed minus expected cars
without defective brakes sold by private owners:
O - E O - E
Defect. Not Defect. Totals by Group
Dealer -152 +152 0
Private +152 -152 0
Totals by Outcome 0 0 0
As you can see from this example, the 2x2 (two outcomes, two groups)
format means that the O - E calculation only has to be done once!
Now Pearson's chi-square statistic is a single number that we're about
to calculate. Once we have it, we can look up in the appropriate
table to find out how likely it is that the number would be as big as
it is, if the "null hypothesis" holds. As to the interpretation of
that, hold on for just one more minute...
The chi-square statistic is the sum of the four ratios (O - E)^2 / E,
one ratio for each of the four categories. That is, in our problem,
152 squared is:
152^2 = 23104
and we'd have these terms:
(23104/1083) + (23104/2571) + (23104/1538) + (23104/3650)
which works out to roughly 51.67. Admittedly we've rounded here and
there a little to keep the numbers whole up to the end, but I'll point
you to a Web page calculator here:
[GraphPad QuickCalcs: Analyze a 2x2 Contingency Table]
http://www.graphpad.com/quickcalcs/Contingency1.cfm
where you can just enter your four original data values and have the
crunching done for you, and their value will be just slightly higher
(51.767 for chi-square without Yates' correction).
Now the interpretation of this statistic depends on what is called the
"degrees of freedom" in the "measurements". This is a fancy term, to
be sure, but what it boils down to is one less than the number of
possible outcomes. If (as here) there are two possible outcomes
(defective vs. nondefective brakes), then the degrees of freedom
('df') is 1.
Here's a table that gives "cut-off" values for the chi-square
statistic with varying degrees of freedom (df) and "levels of
significance" (which amount to probabilities such as mentioned
earlier):
[Chi-square Table]
http://www.ento.vt.edu/~sharov/PopEcol/tables/chisq.html
In that table we see that, for df = 1, the significance level P =
0.001 corresponds to a chi-square value of 10.83. Our value, 51.67 or
so, is much bigger... and so one would say the observations are
"statistically significant" at the 0.001 level.
In fact the chi-square value we got is so big, that it's statistically
significant even at the 0.0001 level. But what does all of that mean?
Well, if the two given sample sizes had been drawn from a uniform
population, then (based certain normal distribution approximations
implied by the use of Pearson's chi-square statistic) it is estimated
a value as large as 51.67 would occur "by chance" less than once in
ten thousand times; hence "significant" at the 0.0001 level.
Actually I didn't find a chi-square table that really gives cut-offs
comparable to the value 51.67. The highest cut-off (for one degree of
freedom) that I found cited was for significance level 0.000001 (one
in one million), and even that was only 23.94, less than half the
result we have.
It's fair to say that the chance of getting the chi-square statistic
to be as big as 51.67 with those sample sizes is extremely small, esp.
in comparison to the levels of significance that are ordinarily used
in surveys and other "social science" applications (where 5% is
frequently used).
One should be aware of making an interpretation, however, when turning
this "math fact" around and concluding that _therefore_ it is likely
that there is a relationship between the type of car sales and the
defectiveness of brakes. This is where "mathematical rigor" requires
some additional knowledge, often of a kind impractical to obtain,
which is why most of us would be forgiven for throwing deduction out
the window and just saying, "since there's less than one chance in a
million this occurred by chance, chances are it wasn't by chance!"
Pearson's chi-square statistic worked very nicely in your case. With
the large number of observations available in your data, it left
little room for doubt that there really is difference in the two
underlying groups (cars sold by dealers vs. cars sold by private
owners). Often one can only obtain much smaller numbers of
observations, either because of budgetary or opportunity limitations,
and then the appropriateness of an underlying (continuous) normal
approximation to a (discrete) binomial model comes into question.
A usual rule of thumb is not to use the chi-square test unless all
four entries in the 2x2 "contingency" table have expected values of at
least 5. Also there's a conservative "correction" (Yates' correction)
that is often applied with fairly small numbers (and one degree of
freedom).
Conservative in this context means making it harder to concluded that
it occurred "by chance" with some constant (low) level of
significance, in effect by handicapping the chi-square statistic to
smaller values. For example, in our data if the Yates' correction
were applied (see the options in the calculator page linked above),
then the statistic would have turned out to be 51.4 instead of 51.6
(still a mind-bogglingly large result, in terms of statistical
significance).
One sometimes sees binomial model computations touted as being
"exact", as they are in some sense. Certainly when there are only a
handful of observations to work with, the binomial model gives the
best feel for how likely the numbers could have occurred "randomly".
Yates' correction can then be viewed as a shortcut to nudge the
chi-square statistic toward agreement with the binomial model's levels
of significance in an intermediate range of sample sizes. By the
point at which all the four entries have expected values of 30 or
more, one would be on pretty safe ground using Pearson's chi-square
test without a correction; of course the advent of computers has
really spoiled us all with the luxury of analyzing computations to far
more decimal places than our data really justify!
regards, mathtalk-ga |