Google Answers Logo
View Question
 
Q: Text/Language processing & Categorization ( No Answer,   2 Comments )
Question  
Subject: Text/Language processing & Categorization
Category: Computers
Asked by: knowledgeseeker-ga
List Price: $10.00
Posted: 27 Jul 2003 12:10 PDT
Expires: 26 Aug 2003 12:10 PDT
Question ID: 235698
Does anyone know or can help me find software or algorithm that would
take as an input a short piece of text and as an output give a set of
categories
that text would belong to. It would also help to get relevance index.

What I am looking for is a piece of code/algorithm that would classify
a short
piece of text. Possibly also extract relevant keywords.

Example 1: "What is the best French restaurant in New York City?"

The output would be: Categories- Restaurant, French or Cat -
Restaurant, Sub-Cat - French.

Ex 2: "Give me a list of available books in Visual Basic"

Result
Category - Books, Sub-Cat - Visual Basic.

It would be nice if the database of Categories, SubCategories that are
used for classification would be expandable.

I'm looking for something that would enable me programmatically 
select  relevant Category/Sub-Category to post in here on google
answers based on the question!

Thank You.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Text/Language processing & Categorization
From: hailstorm-ga on 25 Aug 2003 23:33 PDT
 
knowledgeseeker,

This was a very interesting question that I would have liked to have
answered.  However, though simple in nature, this type of question is
very difficult to provide a reliable programmable solution for, since
it requires the "intelligence" to parse the important information from
an English language statement.

One thought I had was to use the Google API to extract directory
classification information through a Google query.  My results for
your two sample questions, plus one more of my own creation were:

What is the best French restaurant in New York City?
  -  Top/Regional/North_America/United_States/New_York/Localities/N/New_York_City/Manhattan/Business_and_Economy/Restaurants_and_Bars/Guides_and_Directories

Give me a list of available books in Visual Basic
  -  Top/Computers/Programming/Languages/Visual_Basic/Resources

What is the most popular car made in America?
  -  Top/Arts/Literature/Genres/Cyberpunk

The first two queries provide all the classification information we
want.  Actually, too much information, but it may be possible to
whittle that down with further programming.  Unfortunately, the third
query is completely wrong.

So if we can't rely on Google and its multi petabyte storehouse of
information, it may prove difficult to find any solution that can
adequately address your needs.
Subject: Re: Text/Language processing & Categorization
From: bio-ga on 26 Aug 2003 11:35 PDT
 
Hi Knowledgeseeker,

I think a Bayesian classification algorithm can be designed for this
task. But you will first have to "train" it (possibly using all the
past questions asked in GA).

Search Google for "bayesian text classification" for more information.

Bio

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy