I'm looking for a quote, preferably from an industry analyst or,
failing that, from a software review. The quote should talk about how
users or information technology personnel are unhappy with the amount
of training data that has to be labeled (manually sorted into
acategories) in text classification systems that use supervised
machine learning. Examples of companies that sell general purpose
text classification systems are ClearForest, Verity, Inxight, and
Teragram, but there are many, many others. Applications of text
classification include sorting documents into taxonomies (for document
management, web directories, intranets, etc.), alerting people to news
stories, spam & porn filtering, etc. |