Hi,
http://www.clearforest.com/
Claims to do this, but many experts feel that statistical analysis is
inherently limited, and that at some point grunt librarian work has to
be done by hand. It really depends upon the degree of precision
you're looking for. Clay Shirky discusses ways to fool your users into
doing manual classification in his essays on "Folsonomies" which
discusses the ways in which users of online photo applications such as
Flickr tag their own data.
I believe the in receiving an answer is because what you are asking
for is an artificial intelligence application that doesn't yet exist.
The ability to extract actual concepts (as opposed to unique or
repeated combinations of words) falls into 'skynet' territory, i.e. a
machine that has self awareness.
That said, Steven Berlin Johnson has used DevonThink (for the mac)
with some success to identify related topics within his own body of
notes, and the discussions surrounding Johnson's essay on DevonThink
will link to similar windows applications. I'm also interested in
this topic, as I would like, at some point, to parse student essays
and generate concept maps using something like Graphviz, a free
diagram mapping program.
As an aside, Amazon now generates "SIPS" or Statistically Improbably
Phrases" for the books that it sells that have been digitized.
Looking at Malcolm Gladwell's "Blink" for example, reveals a number of
cool buzzwords such as "adaptive unconscious." I think this problem
falls into the realm of computational linguistics, if that helps any.
You might also try posting this query at ask.metafilter.com as they
are a bright but rowdy group.
references:
http://tokerud.typepad.com/blog/2005/02/knowledger_kill.html
http://www.stevenberlinjohnson.com/movabletype/archives/000230.html
These guys have a white paper addressing extracting concepts from
plain text. It doesn't point to an application but might include
enough related terminology to help you narrow your search:
http://garage.cps.msu.edu/papers/GARAGe98-07-02.pdf
another good article:
http://www.intelligententerprise.com/031210/619decision1_2.jhtml
Clearforest reviewed:
http://www.vnunet.com/features/1150440
The University of West Florida claims to be doing some "automated
concept mapping" which looks kinda cool:
http://www.ihmc.us/users/acanas/Publications/AAAI99CmapsCBR/AAAI99CmapsCBR.html
My guess is that ClearForest products cost about as much as a black
market adoption. Good luck. Oh and feel free to google and email me
if you find anyone else working on or talking about this problem. A
search for "Markzilla" turns up my crappy blog and email address. |