Google Answers Logo
View Question
 
Q: Automatic content categorization ( No Answer,   2 Comments )
Question  
Subject: Automatic content categorization
Category: Computers
Asked by: david7777-ga
List Price: $75.00
Posted: 09 Nov 2006 08:29 PST
Expires: 29 Nov 2006 00:41 PST
Question ID: 781356
I'm in the process of putting a print dictionary online. To make it
more valuable to users, I'd like to indicate a category for each term
(or to be precise, a sub-category, since the entire dictionary is
specific to one broad subject.) Since there are over 10,000 terms, to
do this manually would be very time-consuming. Is there a way that I
can totally or mostly automate the categorization of the terms based
on the text of their definitions?
(Typical definitions are 1-4 sentences long.) For example, seeing what
terms mention other terms in their definition, or seeing what terms
talk about similar subjects. (Alternatively, if there's some source
that has already classified all (or nearly all) words by category,
that would presumably include all the words in our dictionary and we
could theoretically use that as a starting point.) I have a good idea
of what the sub-categories should be (probably about 50-60). I did
some Google searches looking for software that would enable me to do
this but didn't find anything very good.
Answer  
There is no answer at this time.

The following answer was rejected by the asker (they received a refund for the question).
Subject: Re: Automatic content categorization
Answered By: easterangel-ga on 09 Nov 2006 18:46 PST
 
Hi! Thanks for the question.

Automated text categorization or taxonomy software could be of help to
your project. There are only quite a handful of software that I see
which can be helpful to your requirements. The following automated
text categorization software could be of assistance to you in making
such a dictionary.

Wordstat
http://www.provalisresearch.com/wordstat/WordstatFeatures.html

Inxight Software
http://www.inxight.com/products/

Autonomy
http://www.autonomy.com/content/Products/Taxonomy/index.en.html

Data Harmony
http://www.dataharmony.com/products/tm.htm

Entrieva
http://www.entrieva.com/entrieva/semiotagger.htm

------------------------------
You may also be interested in taxonomy software for thesaurus products.

a.ka. Classification Software
http://a-k-a.com.au/aka_classification/

Multisystems
http://www.multites.com/

Term Tree
http://www.termtree.com.au/


Search terms used:
text taxonomy thesaurus dictionary
automated text taxonomy categorizer 
  
 
I hope this would help you in your research. Before rating this
answer, please ask for a clarification if you have a question or if
you would need further information.
                                                          
                              
Regards,                              
Easterangel-ga                              
Google Answers Researcher

Request for Answer Clarification by david7777-ga on 15 Nov 2006 14:53 PST
This is not what I was looking for. I had already done google searches
and found taxonomy software (which only took 10-15 minutes). I was
looking for another soultion for this because even the most well known
taxonomies (i.e. yahoo directory and open search project) did not have
the words in our dictionary

Clarification of Answer by easterangel-ga on 15 Nov 2006 15:33 PST
Hi!

Your original question did not mention anything that you have already
looked at taxonomies. It is difficult for researchers to know what you
have searched for or not unless you mentioned them at the outset.
Researhcers can only work on what we are presented in the original
question.

Regards,
Easterangel

Request for Answer Clarification by david7777-ga on 15 Nov 2006 15:43 PST
I had said "I did
some Google searches looking for software that would enable me to do
this but didn't find anything very good." which is a reference to
taxonomy software (I assumed that was implied). Anyway, taxonomy
software cannot do what we need done, I spoke to a couple of companies
in your list below and they said they cannot handle this (or at the
very least it will require alot of human involvement). Plus the
research you did is not worth $75 in my opinion.

Clarification of Answer by easterangel-ga on 15 Nov 2006 15:56 PST
Hi!

Before posting an answer, I have researched this thoroughly and since
the absence of specifically ruling out taxonomy software, these are
the results. Please be also advised that researchers generally find
answers by searching the web and this is what this service offers.

Anyway, I really would like to help you so can you tell me of a
software that comes close to what you need. A specific example will be
good.

Thanks.

Request for Answer Clarification by david7777-ga on 16 Nov 2006 06:15 PST
Hi.

Thats the thing, I dont know of a software that this does they way we
want it done (which means withount any human involvement). Thats why I
asked the question. Taxonomy software requires alot of human
involvement.

Yes I expect researchers to answer questions by using the internet but
I would expect them to go deeper instead of just giving me some links
especially for $75.

And if they dont know something, just to say so

Clarification of Answer by easterangel-ga on 16 Nov 2006 14:10 PST
Hi again.

I apologize that my answer did not satisfy your requirements. You can
ask for a refund of your fee. Please read the following links about a
refund.

"Google Answers: Frequently Asked Questions: What if I don't like my
answer? Can I get a refund?"
http://answers.google.com/answers/faq.html#refund

"Google Answers Refund or Repost Request"
http://answers.google.com/answers/refundrequest
Reason this answer was rejected by david7777-ga:
Please note that this is my second submission of this request for
refund form. Additionally, I also sent 2 emails without receiving a
reply. Ithas been over a week since I submitted my request.

Please credit my account as soon as possible and please send me a
confirmation email.

With regards to the reason for asking for a refund, I outlined the
reasons in my first submission of the request. You can also look at my
communication with the researcher who suggested that I ask for a
refund too.

Basically, the research done by this person was not worth $75 (I'd be
happy to pay upto $15 actually). All this person did was a few google
searches and gave me some links. I had already mentioned that I had
done internet searches myself and found nothing useful. I was looking
for a solution which was well researched, not links which I have to
use do my own research. Addditionally, I also called 2-3 of the
companies for whom I was given links and their reps told me that their
solution could not do what I needed done.

Comments  
Subject: Re: Automatic content categorization
From: singbat-ga on 23 Nov 2006 05:48 PST
 
you are looking for text clustering software.  try the open source
Apache Lucene Project as a base indexing engine (and a very good one
with fairly low entry cost in terms of skills required) in combination
with the open source Carrot2 clustering framework.

Carrot2 sits on top of Lucene and other indexes and automatically
groups results of queries.  that is, at least, the standard usage.

however, you can download and configure Carrot2 to read your corpus
and cluster your definitions (and thereby your words) with some modest
programming effort.  the developers of Carrot2 do offer technical
development support for a fee, though the software itself is free.

my company is currently building a framework using these components to
cluster the text of medical records automatically.  early results are
promising.  there should be substantial similarities to your
situation, though the business domain is clearly different.

there are other clustering tools available, many are commercial and
may require less technical help to set up.  the underlying algorithms
are well-known and available, typically in computer science textbooks.

good luck!
Subject: Re: Automatic content categorization
From: funtick-ga on 23 Nov 2006 11:07 PST
 
Apache Lucene is great, and it has a SOLR subproject which can simplify the task.

You have a Term, and a Definition which may include references to another Terms.

By calculating scores of referenced Terms you may automatically define
Category (referenced Terms with highest scores).

SOLR project has a feature called 'Faceted Browsing': Term may be
contained in a few different Categories, and such design might be
attractive in some cases (when we don't need strict tree of categories
and subcategories).

Apache Lucene is great.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy