Google Answers Logo
View Question
Q: categorization ( Answered 5 out of 5 stars,   1 Comment )
Subject: categorization
Category: Computers
Asked by: mcsemorgan-ga
List Price: $100.00
Posted: 05 Jul 2003 04:20 PDT
Expires: 04 Aug 2003 04:20 PDT
Question ID: 225316
I would like to know following methods that when to apply a method,
how they compare in performance. give me the answer such a discussion.
"for example, few documents need to be classify, they are not
multi-modality and simple, so we can apply which method, and why we
choose this this!!"
1.Rocchio's Algorithm
2.Naive Bayes
3.K-Nearest Neighbor(KNN)
4.Decision Tree
5.Support Vector Machine(SVM)
6.Voted classification
7.Latent Semantic Analysis
Subject: Re: categorization
Answered By: leader-ga on 25 Jul 2003 00:05 PDT
Rated:5 out of 5 stars
Hello mcsemorgan-ga:

After a detailed research, I located the type of papers and documents
that will help you with your research. I have provided a brief
description of every algorithm and some resources for the basic
information but as you need more details on the appropriate usage and
the usefulness of these classifiers, I did extensive research to find
the kind of papers that describe such comparisons and the
effectiveness/problems of these algorithms.

AGAIN, As far as I can assume you may already have knowledge of the
algorithms, therefore instead of Repeating the information or writing
my own review, I tried hard to find the type of papers that can help
you with your research.

You may ignore a brief description of each algorithm and go directly
to the 'DOCUMENTS & PAPERS' section or you may find the resources with
the algorithms very useful.

A great resources for such type of papers is

****************BRIEF DESCRIPTION*********************

Rocchio’s Alogrithem

The Ricchio's Algorithm is based on the Relevancy Feedback Algorithms
for Document Relevancy. Relevancy Feedback Models are an effective way
of modifying and expanding user queries (such as search engines).

Ricchio's Algorithm is one of the earliest methods used for queries.
This algorithm is basesd on the idea that if the relevance for a query
is known, an optimal query vector will maximize the average
query-document similarity for relevant documents, and will
simultaneously minimize
query-document similarity for non relevant documents. 

If we look at the basic formula for the Ricchio's Algorithm, the
intuitive idea of Ricchio's Algorithm is to iteratively increase the
weights of those terms contained in labeled relevant documents while
penalizing the terms in the irrelevant documents.

Recent studies have shown that Ricchio's Algorithm has a poor
performance when the proportion of relevant documents in the whole
corpus is low.

Naive Bayes

Naive Byes algorithms are among the most successful known algorithms
for learning to classify text documents. It predicts by reading a set
of examples in attribute value-representation and than by using the
Bayes Theorem to estimate the posterior probabilities of all
qualifications. For each
instance of the example language a classification with the highest
posterier probability is chosen
as the prediction. 


Suppose your data consist of fruits, described by their color and
shape.  Bayesian classifiers operate by saying "If you see a fruit
that is red and round, which type of fruit is it most likely to be,
based on the observed data sample? In future, classify red and round
fruit as that type of fruit."

A difficulty arises when you have more than a few variables and
classes -- you would require an enormous number of observations
(records) to estimate these probabilities.

Naive Bayes classification gets around this problem by not requiring
that you have lots of observations for each possible combination of
the variables.  Rather, the variables are assumed to be independent of
 one another and, therefore the probability that a fruit that is red,
round, firm, 3" in diameter, etc. will be an apple can be calculated
from the independent probabilities that a fruit is red, that it is
round, that it is firm, that is 3" in diameter, etc.

In other words, Na´ve Bayes classifiers assume that the effect of an
variable value on a given class is independent of the values of other
variable. This assumption is called class conditional independence. It
is made to simplify the computation and in this sense considered to be

This assumption is a fairly strong assumption and is often not
applicable.  However, bias in estimating probabilities often may not
make a difference in practice -- it is the order of the probabilities,
not their exact values, that determine the classifications.

Studies comparing classification algorithms have found the Na´ve
Bayesian classifier to be comparable in performance with
classification trees and with neural network classifiers.  They have
also exhibited high accuracy and speed when applied to large

The problem with multinomial Naive Bayes is that when one class has
more training examples than another, Naive Bayes selects poor weights
for the decision boundary. This is due to an under-studied bias effect
that shrinks weights for classes with few training examples. ANother
systematic problem with Naive Bayes is that features are assumed to be
independent. As a result, even when the words are dependent, each word
contributes evidence individually. Thus the magnitude for the weights
for classes with strong word dependencies is larger than for classes
with weak word dependencies.

K-Nearest Neighbor

The goal of this clustering method is to simply separate the data
based on the assumed similarities between various classes. Thus, the
classes can be differentiated from one another by searching for
similarities between the data provided.

The K-Nearest Neighbor is suitable for data streams. KNN does not
build a classifier in advance. When a new sample arrives, KNN finds
the K neighbors nearest to the new samples from the training space
based on some suitable similarity or distance metric.

KNN is a good choice when simplicity and accuracy are the predominant
issues. KNN can be superior when a  resident, trained and tested
classifiers has a short useful lifespan, such as in the case with the
data streams where new data is added rapidly and the training set is
ever changing. KNN does not rely on prior probabilities, and it is
computationally efficient. The main computation is the sorting of the
training documents in order to find out the K nearest neighbors for
the test document.

K-Nearest Neighbor is useful when their are less than 20 attributes
per instance, there is lots of training data, training is very fast,
learning complex target functions and don't want to loose information.
The disadvantages of using such a function is that it is slow in
sorting out queries and irrelevant attributes can fool the neighbor.

Decision Tree

The Decision Tree exploration engine, new to PolyAnalyst 4.1, helps
solve the task of classifying cases into multiple categories. Decision
Tree is PolyAnalyst's fastest algorithm when dealing with large
amounts of  attributes. Decision Tree report provides an easily
interpreted decision tree diagram and a predicted versus real table.

Problems to Solve: 

Classification of cases into multiple categories 

Target Attributes: 

Categorical or Boolean (Yes/No) attribute 

Output Format: 

Classification statistics 

Predicted versus Real table (confusion matrix) 

Decision Tree diagram 

Optimal Number of Records: 

Minimum of 100 records 

Maximum of 5,000,000 records 

Preprocessing Suggested: 

Summary Statistics - to deselect attributes that contain to many
values to provide any useful insight to the exploration engine.

Underlying Algorithms: 

Information Gain splitting criteria 

Shannon information theory and statistical significance tests. 

The Data Used: 

Decision Tree works on data of any type. The DT algorithm is
well-poised for analyzing very large databases  because it does not
require loading all the data in machine main memory simultaneously.
PolyAnalyst takes a  full advantage of this feature by implementing
incremental DT learning with the help of the OLE DB for Data Mining

The DT algorithm calculation time scales very well (grows only
linearly) with increasing number of data columns. At the same time, it
grows more than linearly with the growing number of data records - as
N*log(N), where N is the number of records. Yet, for data of about
100,000 records, the DT algorithm is often the fastest exploration
algorithm of PolyAnalyst.

Problems to Solve: 

Decision Tree algorithm helps solving the task of classifying cases
into multiple categories. In many cases, this is the fastest, as well
as easily interpreted machine learning algorithm of PolyAnalyst. The
DT algorithm provides intuitive rules for solving a great variety of
classification tasks ranging from predicting buyers/non-buyers in
database marketing, to automatically diagnosing patient in medicine,
and to determining customer attrition causes
in banking and insurance. 

Target Attribute: 

The target attribute of a Decision Tree exploration must be of a
Boolean (yes/no) or categorical data type.

When to Use This Algorithm: 

The Decision Tree exploration engine is used for task such as
classifying records or predicting outcomes. You should use decision
trees when you goal is to assign your records to a few broad
categories. Decision Trees provide easily understood rules that can
help you identify the best fields for further exploration.

The Output: 

The Decision Tree report starts of by giving measures resulting from
the decision tree. These measures are the Number of non-terminal
nodes, Number of leaves, and depth of the constructed tree. Next, the
report provides classification statistics on the decision tree.


The drawback of this algorithm is that large number of gini indices
have to be computed at each node of the decision tree. In order to
decide which attribute is to be split at each node, the gini indices
have to be computed for all the attributes and for each successive
pair of values for all patterns which have not been classified


Support Vector Machine

A support vector machine is a supervised learning algorithm developed
over the past decade by Vapnik and others (Vapnik, Statistical
Learning Theory, 1998). The algorithm addresses the general problem of
learning to discriminate between positive and negative members of a
given class of n-dimensional vectors. For example, if you have a
series of mRNA expression level measurements for each of a large
number of genes, the SVM can learn to answer questions such as, ``Does
the given gene belong to functional class X?'' where X is some
category such as ``ribosomal genes'' or ``sugar and carbohydrate
transporters.'' If you have a collection of digitized images of
handwritten digits, the SVM can say whether a
given image is, say, the number 9.

The SVM algorithm operates by mapping the given training set into a
possibly high-dimensional feature space and attempting to locate in
that space a plane that separates the positive from the negative
examples. Having found such a plane, the SVM can then predict the
classification of an unlabeled example by mapping it into the feature
space and asking on which side of the separating plane the example
lies. Much of the SVM's power comes from its criterion for selecting a
separating plane when many candidates planes exist: the SVM chooses
the plane that maintains a maximum margin from any point in the
training set. Statistical learning theory suggests that, for some
classes of well-behaved data, the choice of the maximum margin
hyperplane will lead to maximal generalization when predicting the
classification of previously unseen examples (Vapnik, Statistical
Learning Theory, 1998). The SVM algorithm can also be extended to cope
with noise in the training set and with multiple classes (Cristianini
and Shawe-Taylor, An Introduction to Support Vector Machines, 2000).

Say that we have a training data set containing n examples, each of
which is a vector of m numbers. These vectors may be thought of as
points in an m-dimensional space. In theory, a simple way to build a
binary classifier is to construct a hyperplane (i.e., a plane in a
space with more than three dimensions) separating class members
(positive examples) from non-members (negative examples) in this
space. Unfortunately, most real-world problems involve non-separable
data for which there does not exist a hyperplane that successfully
separates the positive from the negative examples. One solution to the
inseparability problem is to map the data into a higher-dimensional
space and define a separating hyperplane there. This
higher-dimensional space is called the feature space, as opposed to
the input space occupied by the training examples. With an
appropriately chosen feature space of sufficient dimensionality, any
consistent training set can be made separable. However, translating
the training set into a higher-dimensional space incurs both
computational and learning-theoretic costs. Furthermore, artificially
separating the data in this way exposes the learning system to the
risk of finding trivial solutions that overfit the data.

SVMs elegantly sidestep both difficulties. They avoid overfitting by
choosing the maximum margin separating hyperplane from among the many
that can separate the positive from negative examples in the feature
space. Also, the decision function for classifying points with respect
to the hyperplane only involves dot products between points in the
feature space. Because the algorithm that finds a separating
hyperplane in the feature space can be stated entirely in terms of
vectors in the input space and dot products in the feature space, a
support vector machine can locate the hyperplane without ever
representing the space explicitly, simply by defining a function,
called a kernel function, that plays the role of the dot product in
the feature space. This technique avoids the computational burden of
explicitly representing the feature vectors.

Latent Semantic Analysis

Latent Semantic Analysis (LSA) analyzes word-word, word-passage, and
passage-passage relationships. There's a good relationship between
what LSA extracts and what people understand. LSA doesn't use
first-hand knowledge of the world, but extracts from "episodes" of
word content. LSA doesn't use word order or logic relationships. LSA
uses unitary expressions of meaning instead of relationships between
successive words. A word is a kind of average of meaning through all

Dimensionality is reduced to approximate human cognition. LSA is a
theory of knowledge representation. Dimensionality reduction solves
the problem of "insufficient evidence" or "poverty of the stimulus".
LSA uses a matrix decomposition algorithm to reduce the
dimensionality. Ideally, the dimension of the reconstruction equals
that of the passages. Results show that the meaning similarities are
close to that of humans, LSA's rate of knowledge acquisition
approximates that of humans, and LSA depends on the dimensionality.

LSA can be use to test theories of human cognition. LSA skips over the
order of words to capture relationships in word choice. LSA uses a
pre-processing step for word correlation over many passages and
contexts. LSA uses a very large number of relationships. Theories of
human cognition cannot be settled by theoretical and philosophical

LSA is an automatic mathematical algorithm for extracting
relationships in word usage in passages. It doesn't use dictionaries,
external knowledge, or grammar rules. First, represent words as a
matrix, each row is a word, and each column is a text passage or
context. Next, do a preliminary transformation of the matrix. Each
word frequency in the Next, LSA applies singular value decomposition
(SVD) to the matrix, which is factor analysis. The decomposition
results in dimensionality reduction. Extract words from the passage
into a word matrix, do a linear decomposition of the matrix, then
reduce the dimensionality of the matrix. The LSA matrix adds words not
in the passage, like human minor knowledge acquisition. LSA is
intuitively sensible, with a three-fourths gain in total comprehension
vocabulary inferred from knowledge about words not in the passage or
paragraph. Human children have a rapid growth of vocabulary and
knowledge. Humans draw conclusions from missing data. Reducing the
dimensionality of representation is useful when the representation
matches the data. The data should not be perfectly regenerated. The
similarity of dimensionality reduction is the cosine between vectors.

Before the SVD is computed, LSA does a data preprocessing matrix data
transformation. Save the log word frequency, and the entropy for each
row and column of the word. Weight each word occurrence by an estimate
of its importance in the passage. Knowing a word provides information
about the passage it appeared in. Matrix transformations are used in
information retrieval and human cognition models. A web site provides
LSA based word or passage vectors, similarities between words and
words, words and passages, and passages and passages. LSA is able to
model human conceptual knowledge.
LSA links information retrieval and human semantic memory. Latent
Semantic Indexing (LSI), like LSA, was tested against pre-examined
documents. Direct comparisons were muddied by preprocessing words. LSA
does synonym tests, since most near neighbors are related by the
cosine. LSA vectors were created from many passages. LSA captures
synonymity by knowledge
of captured vocabularies. TOEFL vocabulary simulates human performance
between word choice. LSA errors were compared to student errors. The
role of dimension reduction was analyzed. LSA simulates word sorting
and word relationships. Subject-matter knowledge and sematic priming
are in LSA. Predictive learning and text comprehension for humans.


Voted Classification


************DOCUMENTS & PAPERS***************************


The research compares the effectiveness of five different text
algorithms (Rocchio's Algorithm, Decision Tree, Naive Bayes, Bayes
Nets & Support Vector Machines) in terms of learning speed, real time
classification speed and classification accuracy.

The report discusses a variety of ways of evaluating the effectiveness
of text categorization systems.

This is a very detailed study of machine learning in Automated Text
Classification, different algorithms and issue pertaining to document
representation, classifier construction and classifier evaluation.

This document describes some of the basics of text categorization and
how to access the quality of each.

The paper surveys the machine learning text classification algorithms
and measure their efficiency to predict the best classification

This is a thesis written by one of the doctorate candidates of the
PAce University. It not only clearly defines different sets of  text
categorization algorithms but also compare and contrast them for
different datasets so as to help system designers chose an appropriate
algorithm given their system constraints and application domains. (see
page 37 of the thesis which is Page 47 of the acrobat reader).

This paper reports a controlled study with statistical significant
tests on five text categorization methods and their robustness in
dealing with different situations.

This paper focuses on a comparative evaluation of a wide-range of text
categorization methods, including previously published results on the
Reuters corpus and new results of additional experiments. A controlled
study using three classifiers, kNN, LLSF and WORD, was conducted to
examine the impact of configuration variations in five versions of
Reuters on the observed performance of classifiers.
Analysis and empirical evidence suggest that the evaluation results on
some versions of Reuters were significantly affected by the inclusion
of a large portion of unlabelled documents, making those results
difficult to interpret and leading to considerable confusions in the
literature. Using the results evaluated on the other versions of
Reuters which exclude the unlabelled documents, the performance of
twelve methods are compared directly or indirectly. For indirect
comparisons, kNN, LLSF and WORD were used as baselines, since they 
were evaluated on all versions of Reuters that exclude the unlabelled
documents. As a global observation, kNN, LLSF and neural network
method had the best performance, except for a Naive Bayes approach,
the other algorithms performed relatively well.


The paper describes four machine learning techniques that are common
with the Text categorization and then evaluate their performance

Power Point presentation of five text classifiers (Read the

Hope this will help you. I will try to post more resources if I come
across any. Please clarify if you have any questions regarding this
research or if you need more help.


Request for Answer Clarification by mcsemorgan-ga on 27 Jul 2003 14:42 PDT
Thanks for your research, they are quite detailed. But could you try
to answer my question??
Actually, I have read many references that you researched before. And
I would like you to show me which method in which condition would be
best; we can accord this result to apply each method. Could you try to
list them item by item??

Clarification of Answer by leader-ga on 27 Jul 2003 17:43 PDT
Hello mcsemorgan-ga:

Thanks for the clarification. I will start working on your
clarification tommorow 28th July (EST). I will keep you informed of
the progress.

Best Regards,

Request for Answer Clarification by mcsemorgan-ga on 28 Jul 2003 08:59 PDT
I will pay you more, if the answer is nice!! 
Could you also tell me how they compare in performance(I mean how can
you proof which method in which condition would be best; we can accord
this result to apply each method)!!

Clarification of Answer by leader-ga on 01 Aug 2003 07:52 PDT
Thanks. I have been working on your answer. I will post my answer on
3rd August OR let you know of the progress. Please let me know if you
need it urgently.


Request for Answer Clarification by mcsemorgan-ga on 01 Aug 2003 11:25 PDT
yes, I need it very urgent. If it is possible for you, hope you can
give me the answer as soon as possible.

Clarification of Answer by leader-ga on 03 Aug 2003 10:08 PDT
Thnaks for waiting. I am nearly done with the answer. I will post it
on Aug 4 i.e tomorrow.


Request for Answer Clarification by mcsemorgan-ga on 05 Aug 2003 02:20 PDT
could you post some answer first?? waiting your answer...

Clarification of Answer by leader-ga on 05 Aug 2003 08:05 PDT
Sorry for the delay. I have researched several articles and compiled a
summary of the advantages and disadvantages of the models. I am unable
to find any article that specifically relates to the comparison of
each model and than prove for various situations, which is the best
for each. I am still working on it.



Vector Space Model
The vector space model, is the most popular model and is claimed to be
the best performing model for general document collections.

Advantages and disadvantages: 
(+) The model is relatively easy to understand. 
(+) The model can be applied without modification on new collections.
(+) The model gives a very good retrieval quality in the SMART
(-) It is not (yet) clear what happens if the model is applied to
documents that are longer and more complicated than the short
documents (abstracts) for which it was developed.
(-) There is no sound theoretical basis for the assumption that
documents that resemble the query, are relevant for the question.

Rocchio’s Algorithm
It is one of the earliest algorithm. It s main advantage over the
others is that it is easy to learn and implement although the
classification accuracy may not be of the highest quality.

Advantages & Disadvantages
(+)Easy to implement
(+)Very fast learner 
(+)Relevance feedback mechanism 
(-)Low classification accuracy 
(-)Linear combination too simple for classification
(-)Constant and are empirical 

Latent Semantic Analysis
LSA was originally developed for the task of information retrieval but
in recent years have produced remarkably human like abilities in a
variety of language tasks and is the best algorithm used for
non-English retrieval systems.

Advantages and Disadvantages
	(+) Best IR for non-English document retrieval
	(-) Disregard syntax
	(-) Disregards morphology
	(-) Computationally expensive

Decision Tree Model
Decision Tree is the fastest algorithm when dealing with large amounts
of attributes because it depends on a root system and remove problem

Advantages and Disadvantages 
(+)Easy to understand
(+)Easy to generate rules
(+)Reduce problem complexity 
(+)Training time is relatively expensive
(-)A document is only connected with one branch
(-)Once a mistake is made at a higher level, any sub tree is wrong 
(-)Does not handle continuous variable well
(-)May suffer from over fitting

Compared to other text categorization methods such as Bayesian
classifier, kNN does not rely on prior probabilities, and it is
computationally efficient. The main computation is the sorting of
training documents in order to find the k nearest neighbors for the
test document.

Advantages and Disadvantages
(+)Conceptually simple, yet able to form complex 
(+)Decision boundaries 
(+)Can work with relatively little information 
(+)Combine naturally with human problem-solving 
(+)Learning is simple 
(-)Classification time is long
(-)Difficult to find optimal value of k 

Na´ve Bayes
Naive Bayes classifier is a reasonable approach to many user modeling
problems, given its advantages of quick learning and low computational

Advantage and Disadvantages
(+)Work well on numeric and textual data
(+)Easy to implement and computation comparing with other algorithms 
(-)Conditional independence assumption is violated by real-world data,
perform very poorly when features are highly correlated
(-)Does not consider frequency of word occurrences

Voted Classification 
One of the most effective algorithm as it uses weights to calculate
the relevancy of the classified text.

Advantages and Disadvantages
(+)Surprisingly effective 
(+)Robust to noise
(+)Decrease the overfitting effect 
(-)Require more calculation and memory


Na´ve Bayes Advantages & Uses

Rocchio’s Algorithm

Latent Semantic Analysis - 13k

Decision Tree Model

K-Nearest Neighbor

Vector Space Model

Thanks for the clarifications. I am still trying to find the relevant
documents or papers.


Request for Answer Clarification by mcsemorgan-ga on 06 Aug 2003 00:21 PDT
Thanks for your answer. I think you have work hard on it, but they are
not something that I needed.
I know something that I wanted are difficult to search. Hope you can
still working on it, and give me the answer as soon as possible.

Request for Answer Clarification by mcsemorgan-ga on 10 Aug 2003 11:58 PDT
Could you give me more answers that I wanted before tuesday?? I need
it very urgent, otherwise I have to looking for other peopele.

Clarification of Answer by leader-ga on 10 Aug 2003 19:48 PDT
I am working on your request. I will post my answer before the deadline. Thanks.

Clarification of Answer by leader-ga on 11 Aug 2003 20:48 PDT
Hello mcsemorgan-ga:

My research revealed some additional papers that you may like. The
first four of these papers compare different text categorization
methods for a certain study. Please let me know if these are the kind
of papers that you will consider. Thanks for being patient with me.
(Although I have checked but If a link doesn’t work, please cut and
paste the address in your browser window).

Following are the papers:

A re-examination of text categorization method
This paper reports a controlled study with statistical significance
tests on five text categorization methods: the Support
Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier,
a neural network (NNet) approach, the Linear Leastsquares
Fit (LLSF) mapping and a NaiveBayes (NB) classifier.
We focus on the robustness of these methods in dealing
with a skewed category distribution, and their performance
as function of the training-set category frequency. Our results
show that SVM, kNN and LLSF significantly outperform
NNet and NB when the number of positive training
instances per category are small (less than ten), and that all
the methods perform comparably when the categories are
sufficiently common (over 300 instances).

A comparison of several classification algorithms for Word and Sense
based text categorization

An evaluation of Statistical approaches to text categorization
This paper is a comparative study of text categorization methods.
Fourteen methods are investigated, based on
previously published results and newly obtained results from
additional experiments. Corpus biases in commonly
used document collections are examined using the performance of three
classifiers. Problems in previously published
experiments are analyzed, and the results of flawed experiments are
excluded from the cross-method evaluation. As
a result, eleven out of the fourteen methods are remained. A
k-nearest neighbor (kNN) classifier was chosen for
the performance baseline on several collections; on each collection,
the performance scores of other methods were
normalized using the score of kNN. This provides a common basis for a
global observation on methods whose results
are only available on individual collections. Widrow-Hoff, k-nearest
neighbor, neural networks and the Linear Least
Squares Fit mapping are the top-performing classifiers, while the
Rocchio approaches had rela...

Feature Preparation in Text Classification
We report text categorization accuracy for different types of features
and different types of feature weights. The comparison of these
classifiers shows that stemmed or un-stemmed single words as features
give better classifier performance compared with other types of

***************Other Useful Papers******************

Summarization as feature selection of text categorization
The paper addresses the problem of evaluating the effectiveness of
summarization techniques for the task of document categorization. It
is argued that for a large class of automatic categorization
algorithms, extraction-based document categorization can be viewed as
a particular form of feature selection performed on the full text of
the document and, in this context, its impact can be compared with
state-of-the-art feature selection techniques especially devised to
provide good categorization performance. Such a framework provides for
a better assessment of the expected performance of a categorizer if
the compression rate of the summarizer is known

Improving text categorization methods for event tracking
Automated tracking of events from chronologically ordered
document streams is a new challenge for statistical
text classification. Existing learning techniques
must be adapted or improved in order to effectively
handle difficult situations where the number of positive
training instances per event is extremely small,
the majority of training documents are unlabelled, and
most of the events have a short duration in time. We
adapted several supervised text categorization methods,
specifically several new variants of the k-Nearest Neighbor
(kNN) algorithm and a Rocchio approach, to track
events. All of these methods showed significant improvement
(up to 71% reduction in weighted error rates)
over the performance of the original kNN algorithm on
TDT benchmark collections, making kNN among the
top-performing systems in the recent TDT3 official evaluation.
Furthermore, by combining these methods, we
significantly reduced the variance in performance of our
event tracking system over different ...

SVM out performers other text categorization methods for Web searches
This paper explores the use of Support Vector Machines SVMs for
learning text classifiers from examples It analyzes the particular
properties of learning with text data and identifies why SVMs are
appropriate for this task Empirical results support the theoretical
findings SVMs achieve substantial improvements over the currently best
performing methods and behave robustly over a variety of different
learning tasks Furthermore they are fully automatic eliminating the
need for manual parameter tuning.

Using text categorization techniques for intrusion detection (KNN)


Request for Answer Clarification by mcsemorgan-ga on 17 Aug 2003 08:11 PDT
In the advantages and disadvantages part, you made research on Vector
Space Model but what I needed is Support Vector Machine. Did you make
a wrong research or just type wrong??

Clarification of Answer by leader-ga on 17 Aug 2003 10:04 PDT
Hello again:

I am sorry. Yes, I did a mistake. Instead of posting the results for
SVM, I accidentely posted the results for this model. I have started
doing research on SVM and will post my results anytime tonight or
tommorow before afternoon.

Thanks again for giving me a chance.


Request for Answer Clarification by mcsemorgan-ga on 17 Aug 2003 10:52 PDT
Hope you can post the answer as soon as possible, because I need the
answer very urgent.

Request for Answer Clarification by mcsemorgan-ga on 20 Aug 2003 03:33 PDT
how is your search going?? i need your answer very urgent.

Request for Answer Clarification by mcsemorgan-ga on 23 Aug 2003 12:41 PDT
I didn't get any answer from you!! Are you still doing research??

Clarification of Answer by leader-ga on 25 Aug 2003 15:49 PDT

Reduced computational complexity due to discrete kernel function
Faster training times
Good classification accuracy, 
Fast to classify new instances

There is currently no on-line learning version of
the SVM. The SVM is trained on data normally
using a batch processing method.

Predictions from a SVM are not
Probabilistic . In regression the SVM outputs a point estimate, and
in classification, a `hard' binary decision.

The initial SVM model was designed to separate
two classes. In other words there wasn't originally
a SVM that was designed for multiple class

The requirement to estimate a trade-o parameter and the need to 
utilize `Mercer' kernel functions.

REFERERENCE instances/81d91eaa-da14013dc6  

Sorry! I had an emergency situation and couldn't reply. Please refer
to the reference papers and you can have a very good idea of the
advantages and disadvantages related to SVM.


Request for Answer Clarification by mcsemorgan-ga on 28 Aug 2003 12:48 PDT
I have some questions in the advantages and disadvantages part:
1.In LSA,I don’t understand why it would work better for non-English
text classification and best IR for non-English document retrieval??Do
you have a reference for this statement?
2.In Rocchio's Algorithm, what is "Constant and are empirical"??
3.In KNN, what is "Decision boundaries"??
4.In Naive Bayes, Work well on numeric and textual data. What is the
relevance for text classification ?

Clarification of Answer by leader-ga on 28 Aug 2003 19:49 PDT
Thanks again. I will try to elaborate or provide you with reference. I
will write shortly.

Request for Answer Clarification by mcsemorgan-ga on 31 Aug 2003 09:52 PDT
Please give me the answer before monday, they are very urgent for me.
Thank you!!

Clarification of Answer by leader-ga on 01 Sep 2003 09:59 PDT

Following is the explanation:

In KNN, a decision boundary is the space or boundary, which separates
different class samples from each other in the data.
For Explanation see: ANF figure 4.

When working with numeric and textual data, a difficulty arises when
you have more than a few variables and classes -- you would require an
enormous number of observations (records) to estimate these
probabilities. Naive Bayes classification gets around this problem by
not requiring that you have lots of observations for each possible
combination of the variables. Rather, the variables are assumed to be
independent of one another

Latent Semantic Analysis is a model of language learning, based on the
exposure to texts. It predicts to which extent semantic similarities
between words are learned from reading texts.
Please type in the Keyword, ‘LSA AND Non English text retrieval’ for a
variety of topics on this issue.


Request for Answer Clarification by mcsemorgan-ga on 02 Sep 2003 04:13 PDT
Some questions I'm still not understand.
1.In LSA,I don’t understand your meaning why it would work better for
non-English text classification and best IR for non-English document
Why is non English??
2.In Rocchio's Algorithm, you haven't answer me what is "Constant and
are empirical"??
3.I would like to apply these methods(KNN,SVM,Decision Tree, LSA,
Rocchio Algorithm, Naive Bayes, Voted Classification) which would be
used if for instance:
* the content is very dynamic and changes frequently. For instance
news messages.
* the amount of documents is very high, for instance news or the whole
* the amount of documents is smal: eg a company's intranet
* the documents are very heterogeneous (eg the web)
* the documents are very similar (eg a specialized company's
intranet), but the classification needs to be accurate
Could you tell me which method has preference above the others,
depending on the conditions
Please give me the answer before tomorrow, I need them very urgent.

Request for Answer Clarification by mcsemorgan-ga on 27 Sep 2003 17:48 PDT
I would like to thank leader-ga, his answer was very well. 
If you can did so well, I will thank you too.

Clarification of Answer by leader-ga on 28 Sep 2003 12:51 PDT
Thanks for your comments mcsemorgan-ga. I wish you a very good luck in
the future. I will also like to thank ragingacadam-ga for pointing out
to his answer. I hope it helps mcsemorgan-ga.

mcsemorgan-ga rated this answer:5 out of 5 stars
Great answer

Subject: Re: categorization
From: ragingacademic-ga on 27 Sep 2003 11:31 PDT
leader-ga, please see...




Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy