This question is in regards to David Gondek & Thomas Hofmann's paper
titled "Non-redundant clustering with conditional ensembles" and its
paper is located at http://www.cs.brown.edu/~dcg/publications.html.
I have understood the broad objective of the paper which is to find an
'orthogonal' or very different clustering in a dataset given one
clustering. However, I am confused as how each technique is used and
what they are about.
I have tracked down the problem to their eariler paper titled
"Conditional information bottleneck clustering" which probably is a
good point to start looking at.
I understand the general theory of "entropy", "conditional entropy"
and "mutual information". However, I do not understand the concept of
"information bottleneck" and what and how it achieves what it says it
achieves in the context of data clustering.
Please provide as much information as possible to help me understand
the techniques behind "non-redundant clustering with conditional
ensembles". In particular below topics confuse me more than others.
- what is inforation bottleneck method and how does it achieve what it
achieves? I have read the paper but do not understand how the key
equations work. Please provide details on these.
- what is exactly conditional mutual information and how does this
look like in clustering?
- what is "generalized mutual information"? I do not understand what
Havrda-Charvat structural a-entropy refers to.
- It would be great if answers could be provided with examples as well
as equations that are essential to understand this technique!
Please note that I have read through "Information Bottleneck Method"
by Tishby, Pereira and Bialek, "Extracting relevant structures with
side information" by Chechik and Tishby, "Non-redundant data
clustering" by Gondek and Hofmann, "Combining multiple weak
clusterings" by Topchy, Jain, Punch and other relevant papers
referred, so you can make references to those papers!
thanks in advance. |