Mutual information is a termed used in the science of information
theory. Information, developed by Claude Shannon at Bell Labs in the
1940s, http://www.lucent.com/minds/infotheory/ , has applications
particularly in machine learning (also pattern classification,
aritificial intelligence, statistics, and the like) and communication
(including coding and cryptography). See
http://www.math.psu.edu/gunesch/Entropy/infcode.html for various links
on information theory; see, for example,
http://131.111.48.24/pub/mackay/info-theory/course.html for a short
course.
The standard textbook on information theory is Elements of Information
Theory by Thomas Cover and Joy Thomas.
In order to understand the notion of mutual information, two
preliminary concepts must first be understood: entropy and conditional
entropy.
Of the many links on the net to these topics, one of the clearest
expositions seems to be the course notes to the Stanford course in
information theory at: http://www.stanford.edu/class/ee376a/ . The
general introduction is in lecture 1 at
http://www.stanford.edu/class/ee376a/handouts/lect01.pdf and mutual
information is defined, with examples, at:
http://www.stanford.edu/class/ee376a/handouts/lect02.pdf . (You will
need a pdf viewer to access these files, see:
http://www.adobe.com/products/acrobat/readstep2.html ) .
Rather than recap the entire first two lectures, I will give a very
quick bird's eye view here:
Entropy is a measure of the uncertainty of a random variable. If we
think of a discrete random variable X, the entropy of that variable is
the average number of bits required to represent an observation of X
from a long string of symbols from X. The more uniform X is, the more
information any particular value gives us, and the higher the entropy.
Conditional entropy of two random variables X and Y, H(X|Y) is a
measure of the uncertainty in X once we know Y. Once Y is known, we
can represent X with on average H(X|Y) bits.
The difference, H(X)-H(X|Y) can therefore be interpreted as a measure
of how much information Y gives about X on average.
This difference is the mutual information of X and Y, usually written
I(X;Y).
An example of the use of mutual information in the interpretation of
MRI images of breasts is at:
http://www-ipg.umds.ac.uk/d.rueckert/research/breast/breast.html . In
this case, the researcher was interested in the amount of information
one image, A, gives about a second image, B.
Search strategy:
"information theory"
"mutual information"
"mutual information" example.
OTHER LINKS:
A short course on information theory is here:
http://www.inference.phy.cam.ac.uk/mackay/info-theory/course.html
Primer for biologists http://www.lecb.ncifcrf.gov/~toms/paper/primer/
Mathematical definitions:
http://cgm.cs.mcgill.ca/~soss/cs644/projects/simon/Entropy.html |
Clarification of Answer by
rbnn-ga
on
23 Sep 2002 13:37 PDT
Intuitively speaking, the mutual information of two random variables
is the amount of information they have in common.(
http://www.stanford.edu/class/ee376a/handouts/lect02.pdf [page 4]).
The for mutual information *is*:
I(X;Y) = H(X) - H(X | Y)
where H(X) is the entropy of X and H(X|Y) is the entropy of X given Y.
Actually though it is possible to expand this out in terms of
fundamental definitions as shown on page 3 of the reference:
http://www.stanford.edu/class/ee376a/handouts/lect02.pdf .
Suppose we have a random variable X that takes value x with
probability p(x) and takes value y with probability p(y).
Let p(x,y) be the probability that X=x and Y=y.
Then the mutual information of X and Y is the sum, over all pairs
{x,y} of
p(x,y) log (p(x,y) / ( p(x) p(y) )
Intuitively speaking, applications of mutual information would include
situations where we are interested in finding out how much information
two items have in common.
Here are a couple of examples:
The thesis "Alignment by Maximization of mutual information"
http://citeseer.nj.nec.com/cache/papers/cs/9621/http:zSzzSzwww.ai.mit.eduzSz~violazSzresearchzSzpublicationszSzPHD-thesis.pdf/viola95alignment.pdf
describes how we can use mutual information to determine the correct
alignment of an image. The correct alignment will be the alignment
that maximizes the mutual information between the two images.
Suppose you are given two photographs of an object and you want to
determine if they are photographs of the same object. The mutual
information formula might be used (albeit with additional mathematical
massaging) to determine if these objects have a lot of information in
common, in which case they might be the same object.
Consider an assembly line that has a camera that takes pictures of
parts coming down the assembly line and then feeds the image to a
robot that has to put the part into the correct orientation. Mutual
information can be used to help determine the correct orientation.
In biology, we might be interested in how similar two sequences of DNA
are, to determine, for instance, if they represent the same gene. The
mutual information formula might be involved here.
http://www.smi.stanford.edu/projects/helix/psb00/grosse.pdf "Average
mutual information of coding and noncoding DNA".
However, I am not sure I would call mutual information a "technology"
in your phrase; it's a just a mathematical definition.
|