Google Answers Logo
View Question
Q: I need a function to compare classifier confusion matrices in discrete problems ( No Answer,   3 Comments )
Subject: I need a function to compare classifier confusion matrices in discrete problems
Category: Science > Math
Asked by: mariane0-ga
List Price: $40.00
Posted: 26 May 2005 08:58 PDT
Expires: 25 Jun 2005 08:58 PDT
Question ID: 525894
This is a follow-up to question :
"I need a function to compare classifier accuracies in discrete problems "

The answer provided by rincewind-ga was perfect, but I would
now need to go more deeply into this problem. Please read the
previous question as I'm picking up where the last one left off.

A confusion matrix is a table of classifier results:

         P      N   <- Predicted class of the examples
P      TP   FN
N      FP   TN
actual class of the examples

TP means True Positives. It is often expressed as a frequency: the
number of accurately classified positive examples / the total number of

FP, False Positives, is the frequency with which positive examples have
been (mis)classified as negatives examples, and so on for the negatives.

Very many measures of classifier performance build up from these 4

So, how would you reduce bigger square matrices, representing N classes
problems, to these 4 numbers?

Should we distinguish between examples of class A misclassified as
belonging to class B and examples of class A misclassified as belonging
to class C in the 3 class confusion matrix:

         A      B      C   <- Predicted class of the examples
A      AA   AB    AC
B      BA   BB   BC
C      CA   CB   CC
actual class of the examples

Where XY means the number of examples of class X classified as
belonging to class Y over the total number of examples.

Any ideas welcome if they are coherent with rincewind-ga's answer :-).
Please explain step by step if you go into calculus or something.


Clarification of Question by mariane0-ga on 26 May 2005 09:44 PDT
It may be usefull to consider grouping the results as one class vs. 
all the others. 

In this case, if we consider class A vs (B and C)

         A      B      C   <- Predicted class of the examples
A      AA   AB   AC
B      BA   BB   BC
C      CA   CB   CC
actual class of the examples

TP <- AA
FP <- BA and CA
FN <- AB and AC
TN <- BB and CC (and eventually BC and CB which have "correctly" been 
classified as not belonging to class A. 


Clarification of Question by mariane0-ga on 28 May 2005 11:01 PDT
The random classifier answers at random, equi distributed.

2 class values:
pos = number of positive examples
p2 = probability of properly classifying a positive example = 1/2
q2 = probability of misclassifying a positive example = 1 - p = 1/2
P2(X) = probability of properly classifying X of the pos positive examples
P2(X) = Combin(pos, X) * p2^X * q2^(pos-X)

N class values:
pos = number of examples of class A = number of positive examples
pN = probability of properly classifying an example of class A = 1/N
qN = probability of misclassifying an example of class A = 1 - p
PN(X) = probability of properly classifying X of the examples of class A
PN(X) = Combin(pos, X) * pN^X * qN^(pos-X)

with Combin(pos, X) = X! / (X! * (N-X)!) 

So a possibility might be to find a relationship between these two 
Bernouilli distributions. 

We know:


We want: a way to calculate P2(X) from PN(X) for all X

PS Are you still around mathtalk-ga? I'm sure this would be 
easy for you :-). Please help. 


Clarification of Question by mariane0-ga on 28 May 2005 11:14 PDT
I'm not saying I don't want somebody else to answer, just that 
it was mathtalk-ga who gave me this formula in the first place. 
And said it was "simple". 

"Simple" I find rather encouraging, it's "trivial" which I resent ;-). 


Clarification of Question by mariane0-ga on 30 May 2005 09:28 PDT
PS I realise that in my way of calculating this I loose even 
true/false information about classes other than A. To be more 
precise, I consider that examples of class B classified as C 
and examples of class C classified as B are correct, in that 
they don't belong to class A and have "accurately" been classified 
as not belonging to class A.

Request for Question Clarification by mathtalk-ga on 25 Jun 2005 07:56 PDT
Hi, mariane0-ga:

I'm not sure if you are still interested in this Question, and I was
reluctant to post an Answer for which you would be charged if, as
seemed might be the case, you have lost the ability to log into your
Google Answers account.

Since the Question is about to expire, I'll post a link for you to an
article which discusses the issue of classifier comparison in a
multiple outcome context:

[Methods for Multi-Category Cancer Diagnosis from Gene Expression Data:
  A Comprehensive Evaluation to Inform Decision Support System Development]

See in particular the Sections titled "Performance metrics" and
"Statistical comparison among classifiers".  As you will see, the
first metric used in this study is accuracy, ie. percentage of correct
classifications.  However as the authors note (and you intuit, and I
was elaborating on), this measure of performance is sensitive to the
overall distributions of the true outcomes.  To give a trivial
example, since a small number of people have hearts on the right side
of the body, an accurate classifier could be obtained by predicting
the heart is on the left in every case, without using any input
information.  But if it were important to identify that small subset,
accuracy would not be a suitable measure of performance.

For this reason the authors also use a second metric called "relative
classifier information" (RCI) which gives one approach to correcting
for the number and distribution of categories:

"RCI is an entropy-based measure that quantifies how much the
uncertainty of a decision problem is reduced by a classifier relative
to classifying using the priors [16]." [Reference given to paper by V.
Sindwani et al]

They also reference some other multi-category comparison methodologies
which they decided not to pursue in this study.

Hope this helps!  You should still be able to post Comments on this
thread after the Question expires if you would like further

regards, mathtalk-ga
There is no answer at this time.

Subject: Re: I need a function to compare classifier confusion matrices in discrete probl
From: mathtalk-ga on 29 May 2005 21:58 PDT
Hi, Mary:

I'm glad I used the word "simple" then, in my Answer long ago in Nov.
2003, but perhaps a clarfication of my meaning is called for.

I've tried to review the recent Questions posted by you, esp. the
detailed Comment posted by Lars (rincewind-ga, hooray for Terry
Pratchett fans!).

The present Question concerns relating the probabilities P2(X) and
PN(X).  I think there is a thread of thought here leading all the way
back to my earlier Answer, and I feel it best to try and reconstruct
the chain as best I can.

The phrase I used then, "a simple binomial distribution", certainly
refers to a context in which the outcome is "binary".  I guess
"trivial" should be reserved for those cases in which only one outcome
is possible!  When more than two outcomes are possible, repeated
independent trials (assuming a fixed underlying probability for the
outcomes) could be said to have a "multinomial distribution".

Now it's true that the multinomial (N-outcome) way of looking at
"classifier" results might be simplified to a binary framework of just
recording "right" or "wrong" results.  In general the N outcomes are
not equally likely, but there is a "Bayesian" approach for describing
the probabilities of "right vs. wrong" in terms of the conditional
probabilities of N-outcomes given a particular "class" of test data.

But let's back up a bit and try to pin down better what is really
sought.  After all your Subject line says, "I need a function to
compare classifier confusion matrices in discrete problems".  In that
regard I would suggest that a statistic which attempts to "compare"
all classifiers needs to take into account the underlying
probabilities of the actual outcomes.

In that connection I would choose as the "benchmark" not a random
classifier which picks all the outcomes with equal chances (without
regard to the attributes or "input variables" of the test data), but
rather one which yields the outcomes according to their natural

To get past that point, at least temporarily, let's assume for the
sake of discussion that the N outcomes do occur in reality with equal
probability, so that the random classifier as Mary has stipulated it
would agree with my frame of comparison.

The main point to be made is that the expedient of recording only the
rightness or wrongness of a classifier's output leads only to a 2xN
tabulation of results, not a 2x2 tabulation.  The confusion matrix
presumes that the "real" situation is binary, and that the truth (or
falsity) of the classifier can then be equated to "true positive" or
"true negative" (resp. "false positive" or "false negative").

Suppose the "real" outcomes are A, B, or C as in the illustration.  We
could indeed summarize a classifiers results (on a suitably broad test
set) as:

  True A   True B   True C
  False A  False B  False C

By further restricting attention to cases as A or non-A, we could summarize as:

  True A   True non-A
  False A  False non-A

Even if we correctly simulate the proper frequencies of A,B,C in our
test data, it is hard to avoid the prospective conclusion that we are
throwing away information about the classifier's performance by
restricting attention in this way to A vs. non-A discrimination.

Perhaps this is precisely what Mary wants to do here, and I'm game to
provide the necessary formulas from probability to link the NxN
matrices to 2x2 matrices under then appropriate simplifying
assumptions.  I just thought it best to drag out the rationale (or at
least the enthusiasm!) for doing so.

regards, mathtalk-ga
Subject: Re: I need a function to compare classifier confusion matrices in discrete probl
From: mariane0-ga on 30 May 2005 09:23 PDT
Hi mathtalk, nice to hear from you again. 

> a statistic which attempts to "compare"
all classifiers needs to take into account the underlying
probabilities of the actual outcomes 

They are unknown... All we know is the number of examples 
of such a class in our training dataset. This can be used as 
a substitution, I guess. 

I'm actually trying to write this comparison function, here's 
the justification: 

Any N-classes classification problem can be transformed into several
2-classes problems. When the classes can't be ordered this is usually
done by classifying one class vs. the N-1others, and repeating the
process for each class, but there are many other, dataset-dependent,
possibilities. When the classes can be ordered, even if the order is
empirical, it is usually done by classifying the top-most class vs.
the others, then the two top-most classes together vs. the others,
etc. It is quite possible that the classifier which worked best on the
original dataset won't be the same as the one which gets the best
results on the two-classes problems.

There are several reasons why someone would do that. A researcher may
wish to experiment with several classifiers on a 3-class-values
dataset while one of these classifiers is only built to handle
2-class-values problems. A data miner may wish to focus upon a
particular class. A user may find large trees difficult to read and so
reduce the number of class values in order to reduce the size of the
induced decision tree. Supposing he were using C4.5 on the iris
dataset, he would notice that grouping the virginicas with the
versicolors reduced the number of leaves from 5 to 2 and increase the
accuracy from 96% to 99%, and he would then wonder which tree was

Anybody presenting this kind of results now ends up saying something
like "on the two-classes problems the average accuracy was 81%, while
on the 5-classes problems it was only 75%. Of course, a 5-classes
problem is intrisically more difficult than a two-classes problem".
But he can't say on which dataset his classifier worked best while
taking into account this increased difficulty, which is a bit
unsatisfying for him (and for his audience too, if they notice).

As you noticed, this has been bugging me for a while. I used rincewind's 
formula to compare accuracies, but if want to go any further I need a way 
to reduce an NxN confusion matrix into a 2x2 confusion matrix, because 
many measures of classifier performance are based upon formulas with TP, 
TN, FP and FN. 

Let us then consider class A vs. the others. 

Now, using frequencies (these numbers over the total number of examples), 
I can use Rincewind's formula to know TP + TN (= the accuracy on 2 classes). 

Value of the accuracy on the original 3-classes problem = "Acc3".
Value of the corresponding accuracy on the 2-classes problem = "Acc2".

To be consistent, the error must of course be 1 ? the accuracy, 
so (FP + FN) = 1 - Acc2. 

The contribution of class A towards accuracy should be scaled:
TP = AA * Acc2 / Acc3 and TN = Acc2 ? TP.

The proportion of examples of class A vs. the total number of examples
should not be modified. This is a descriptor of the data set, not a
classifier result.

So TP + FN = AA + AB + AC -> FN = AA + AB + AC ? TP
Finally, the sum of frequencies being 1,
FP = 1 ? (TP + TN + FN). 

This is as far as I got, BUT how on earth does this correspond to multinomial 

You see, I can understand Rincewind's formula, but I don't understand 
yours. I mean, I can use it, but I wouldn't know how to derive it. 

So I'm wondering whether my way of calculating this makes any sense from 
a statistical point of view? 

I'm hoping that a way of relating the probabilities P2(X) and
PN(X) would somehow "prove" that Rincewind's formula is the right 
choice... ? 

As for throwing away information, yes, well, I think this can't be 
helped: there's no way a 2x2 matrix can contain as much information 
as an NxN matrix, but I'm hoping that it would be a valid answer 
to this problem to repeat the calculation for all classes (taken 
one by one vs. all the others) and to average the results... 
What do you think?  

> I'm game to
provide the necessary formulas from probability to link the NxN
matrices to 2x2 matrices under then appropriate simplifying

Please do so. If you can think of any way to link this link with 
Rincewind's formula, all the better. If not, I'll just try a few 
numerical examples to see how the results of both transformation 
compare to each other. 


Subject: Re: I need a function to compare classifier confusion matrices in discrete probl
From: mathtalk-ga on 25 Jun 2005 08:10 PDT
The formulas for RCI (relative classifier information) as based on a
confusion matrix Q are summarized in the last two pages of this

[Comparison of Machine Learning Algorithms on Different Microarray Data Sets]

best wishes, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy