Google Answers Logo
View Question
Q: Keyword/Statistically Improbably Phrase extraction algorithms? ( Answered,   1 Comment )
Subject: Keyword/Statistically Improbably Phrase extraction algorithms?
Category: Computers > Algorithms
Asked by: amiramiramir-ga
List Price: $50.00
Posted: 01 Sep 2006 09:13 PDT
Expires: 01 Oct 2006 09:13 PDT
Question ID: 761400
What are some simple algorithms for keyword and phrase extraction?

I am trying to understand how Amazon's SIP and Yahoo's Term Extraction
features are implemented. I'd like to find an algorithm with no
training requirements, preferably statistical and amenable to
near-real-time execution. A python implementation would be nice.
Subject: Re: Keyword/Statistically Improbably Phrase extraction algorithms?
Answered By: pafalafa-ga on 22 Sep 2006 16:54 PDT

For starters, there are no SIMPLE algorithms for the type of phrase
extraction you've asked about.

Amazon's SIP feature -- and similar systems -- are highly
sophisticated text recognition systems that identify known words and
catalog unknown words, "tokenize" words to create relations, assign
parts of speech, analyze word combinations, and crucch a lot of
statistics to isolate key phrases of interest, while discarding the
oft-repeated but not very useful phrases (such as: "in other words")
which don't contribute much of substance to a given text.

That said, there is one pretty good software system available that you
can play around with a bit for phrase extraction:
Open Source Information Extraction from The University of Sheffield

There is a lot of information linked to at this site, including a
download of the software, full documentation, tutorials.

Under the "Try a Demonstration" heading, there is also a link to an
Online demonstration of ANNIE which may be a good place to get started
playing around with its capabilities.

I trust ANNIE will give you some of the key capabilities you were asking about.  

However, if there's anything else you need, just let me know by
posting a Request for Clarification, and I'm at your service.

Have fun.


search strategy -- Used bookmarked site for lexical analysis and tools.
Subject: Re: Keyword/Statistically Improbably Phrase extraction algorithms?
From: rakesh_arky_ambati-ga on 04 Sep 2006 11:19 PDT
You will find this talk interesting.



Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy