amiramiramir-ga,
For starters, there are no SIMPLE algorithms for the type of phrase
extraction you've asked about.
Amazon's SIP feature -- and similar systems -- are highly
sophisticated text recognition systems that identify known words and
catalog unknown words, "tokenize" words to create relations, assign
parts of speech, analyze word combinations, and crucch a lot of
statistics to isolate key phrases of interest, while discarding the
oft-repeated but not very useful phrases (such as: "in other words")
which don't contribute much of substance to a given text.
That said, there is one pretty good software system available that you
can play around with a bit for phrase extraction:
http://www.aktors.org/technologies/annie/
ANNIE
Open Source Information Extraction from The University of Sheffield
There is a lot of information linked to at this site, including a
download of the software, full documentation, tutorials.
Under the "Try a Demonstration" heading, there is also a link to an
Online demonstration of ANNIE which may be a good place to get started
playing around with its capabilities.
I trust ANNIE will give you some of the key capabilities you were asking about.
However, if there's anything else you need, just let me know by
posting a Request for Clarification, and I'm at your service.
Have fun.
pafalafa-ga
search strategy -- Used bookmarked site for lexical analysis and tools. |