Google Answers Logo
View Question
 
Q: Automatic news classification ( No Answer,   0 Comments )
Question  
Subject: Automatic news classification
Category: Computers
Asked by: mcsemorgan-ga
List Price: $50.00
Posted: 01 Jun 2003 15:47 PDT
Expires: 07 Jul 2003 14:48 PDT
Question ID: 211680
I would like to write a paper which topic is "Automatic web news
classification". This paper should discuss fetch News data, classifies
and clusters News content automatically. The system will reduce the
human
effort invoked and increase the efficiency of the process from getting
the
News to showing the result. I need some references(paper not
programming) of this topic!!!

for example, something like Google News. Google News uses, what it
cals, 'grouping technology' to classify and group similar news
items.

Request for Question Clarification by ragingacademic-ga on 02 Jun 2003 12:02 PDT
mcsemorgan -

Hello again.

In this case, are you seeking *** references *** - or are you looking
for pointers to written papers again?

thanks,
ragingacademic

Clarification of Question by mcsemorgan-ga on 02 Jun 2003 12:14 PDT
this time is reference.

Thanks!!
Answer  
There is no answer at this time.

The following answer was rejected by the asker (they received a refund for the question).
Subject: Re: Automatic news classification
Answered By: ragingacademic-ga on 02 Jun 2003 17:37 PDT
 
Dear mcsemorgan,

Thanks for your question.  First, let me request that if any of the
following is unclear or if you require any further research – please
don’t hesitate to ask me for a clarification.

You requested references related to automated news classification on
the Web.

I’ve provided some references below – they are all available on the
Web.  Are you also interested in references that you may be able to
access through your library?

Please review the following references and let me know if we are in
the right direction or not – will be glad to provide you with
additional references to help you complete your project successfully
as needed.


Bikini: User Adaptive News Classification in the World Wide Web
Dorfer, Eilurt, Mentrup, Muller, Rolf, Rollinger, Sievertsen, Trenkamp
Institute for Semantic Information Processing, University of
Osnabruck, Germany

http://citeseer.nj.nec.com/cache/papers/cs/22129/http:zSzzSzwww.dfki.dezSz~raferzSzum01-ml4um-wszSzpaperszSzMM-etal.pdf/bikini-user-adaptive-news.pdf


Extracting Semistructured Information from the Web (1997)
J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo
Proceedings of the Workshop on Management of Semistructured Data

“We describe a configurable tool for extracting semistructured data
from a set of HTML pages and for converting the extracted information
into database objects.”

http://citeseer.nj.nec.com/cache/papers/cs/1719/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSzextract.pdf/hammer97extracting.pdf


Knowledge Discovery for Automatic Query Expansion on the World Wide
Web
Mathias Géry, M. Hatem Haddad

“The World-Wide Web is an enormous, distributed, and heterogeneous
information space. Currently, with the growth of available data,
finding interesting information is difficult. Search engines like
AltaVista are useful, but their results are not always satisfactory.
In this paper, we present a method called Knowledge Discovery on the
Web for extracting connections between terms.”

http://citeseer.nj.nec.com/cachedpage/508826/1


User Modeling for Adaptive News Access
DANIEL BILLSUS and MICHAEL J. PAZZANI

“We present a framework for adaptive news access, based on machine
learning techniques specially designed for this task. First, we focus
on the system's general functionality and system architecture. We then
describe the interface and design of two deployed news agents that are
part of the described architecture.”

http://citeseer.nj.nec.com/cache/papers/cs/27103/http:zSzzSzwww.ics.uci.eduzSz~pazzanizSzPublicationszSzBillsusA.pdf/billsus00user.pdf


A Learning Agent for Wireless News Access (2000)
Daniel Billsus, Michael J. Pazzani, James Chen
Intelligent User Interfaces

“We describe a user interface for wireless information devices,
specifically designed to facilitate learning about users' individual
interests in daily news stories. User feedback is collected
unobtrusively to form the basis for a content-based machine-learning
algorithm. As a result, the described system can adapt to users'
individual interests, reduce the amount of information that needs to
be transmitted, and help users access relevant information with
minimal effort.”

http://citeseer.nj.nec.com/cache/papers/cs/15634/http:zSzzSzlieber.www.media.mit.eduzSzpeoplezSzlieberzSzIUIzSzBillsuszSzBillsus.pdf/billsus00learning.pdf


A Personal News Agent that Talks, Learns and Explains (1999)
Daniel Billsus and Michael J. Pazzani
Proceedings of the Third International Conference on Autonomous Agents
(Agents'99)

“Most work on intelligent information agents has thus far focused on
systems that are accessible through the World Wide Web. As demanding
schedules prohibit people from continuous access to their computers,
there is a clear demand for information systems that do not require
workstation access or graphical user interfaces. We present a personal
news agent that is designed to become part of an intelligent,
IP-enabled radio, which uses synthesized speech to read news stories
to a user.”

http://citeseer.nj.nec.com/cache/papers/cs/5681/http:zSzzSzwww.ics.uci.eduzSz~dbillsuszSzpaperszSzagents99-news.pdf/billsus99personal.pdf


A Hybrid User Model for News Story Classification (1999
Daniel Billsus, Michael J. Pazzani

“We present an intelligent agent designed to compile a daily news
program for individual users. Based on feedback from the user, the
system automatically adapts to the user's preferences and interests.
In this paper we focus on the system's user modeling component.”

http://citeseer.nj.nec.com/cache/papers/cs/9177/http:zSzzSzwww.ics.uci.eduzSz~dbillsuszSzpaperszSzum99.pdf/billsus99hybrid.pdf


Automated Online News Classification with Personalization
Chan, Sun and Lim
(from Google cache)

http://216.239.33.100/search?q=cache:iZ5Tu-y-k68J:mandolin.cais.ntu.edu.sg/~sunaixin/paper/sun_icadl01.pdf+%22news+classification%22&hl=en&ie=UTF-8


Self-organizing classification on the Reuters news corpus

Stefan Wermter
The Informatics Centre
School of CET
University of Sunderland
St. Peter’s Way, Sunderland SR6 0DD
United Kingdom
Stefan.wermter@sunderland.ac.uk

Chihli Hung 1
The Informatics Centre
School of CET
University of Sunderland
St. Peter’s Way, Sunderland SR6 0DD
United Kingdom
Chihli.hung@sunderland.ac.uk

http://www.his.sunderland.ac.uk/ps/coling-232.pdf


Using Structured Self-Organizing Maps in News Integration Websites
Ivan Perelomov, Arnulfo P. Azcarraga, Jonathan Tan and Tat Seng Chua
PRIS Group, School of Computing, National University of Singapore,
Singapore 117543

http://www2002.org/CDROM/poster/105.pdf


Theme-based Retrieval of Web News
Nuno Maria, Mário J. Silva
DI/FCUL
Faculdade de Ciências
Universidade de Lisboa
Campo Grande, Lisboa
Portugal

http://xldb.fc.ul.pt/ariadne/documentos/p64webdb2000.pdf


Modular Preference Moore Machines in News Mining Agents
Stefan Wermter and Garen Arevian
University of Sunderland
The Informatics Centre, SCET
St. Peter’s Campus, St Peter’s Way
Sunderland SR6 0DD, United Kingdom

http://www.his.sunderland.ac.uk/ps/nafips5.pdf


I hope this response adequately addresses your request.  Please let me
know if you are in need of additional information concerning this
query.

Thanks,
ragingacademic-ga


Search Strategy:

"news classification"

Request for Answer Clarification by mcsemorgan-ga on 05 Jun 2003 05:15 PDT
could you search some references which contents are using the HTML
format to make the result of the classification more accurate. And
focus on fetch news data, classifies and clusters news content
automatically.

Request for Answer Clarification by mcsemorgan-ga on 08 Jun 2003 11:37 PDT
could you give me more information as I said??

Clarification of Answer by ragingacademic-ga on 08 Jun 2003 16:13 PDT
mcsemorgan -

My sincere apologies, you'll have a clarification shortly!!

thanks,
ragingacademic

Clarification of Answer by ragingacademic-ga on 09 Jun 2003 14:05 PDT
Dear mcsemorgan -

You requested a clarification on my previous response.

Please note that all of the references provided previously are highly
relevant to your search.

Here are some additional references you may find valuable - 

Google discusses its news grouping approach here - 

http://news.google.com/help/about_news_search.html

"The Push for News Returns" - excellent article from Wired that
discusses the history of news classification on the Web (briefly...),
touching on a variety of news grouping and clustering technologies:

http://www.wired.com/news/business/0,1367,51112,00.html

And a good article on Google News from SearchEngineWatch -

http://www.searchenginewatch.com/searchday/article.php/2160891


Newsblaster
-----------

Columbia University is testing a service similar to Google News,
called Newsblaster -

http://www1.cs.columbia.edu/nlp/newsblaster/

And...there are quite a few papers written by Columbia academics on
the subject; these papers can be found here -

http://www1.cs.columbia.edu/nlp/newsblaster/papers/index.html

Here is information about Newsblaster -

http://www1.cs.columbia.edu/nlp/newsblaster/faq.html

And a good article on Newsblaster -

http://www.ojr.org/ojr/technology/1015015422.php


NewsinEssence
-------------

Yet another similar service from the University of Michigan is
NewsinEssence -

http://www.newsinessence.com/nie.cgi

Links to team, presentations, papers etc. relating to clustering and
NewsinEssence -

http://www.newsinessence.com/docs/about.html

Another good and relevant Wired article, "Separating News from Noise"
-

http://www.wired.com/news/culture/0,1284,43444,00.html


Links to some other AI generated news sources - 

http://www.aaai.org/AITopics/html/morenews.html


Here's a good search to run on Google for additional information -

"google news" newsinessence newsblaster


Please let me know if this satisfies your request - best bet for
academic-quality references is to follow through on the Columbia and U
of MI links.

thanks!
ragingacademic

Request for Answer Clarification by mcsemorgan-ga on 09 Jun 2003 15:31 PDT
Sorry!! I think maybe you don't understand what I wanted. There are
many different methods using in news classification. Most news
websites are short of automated process, they need to invoke human
effort to classify or select their News in the news pages. Therefore,
I would like find some references which build to solve these problems,
and their contents are about fetch News data, classifies and clurster
News content automatically.
The method is using the "HTML format" to make the result of the
classification more accurate and "automatically".

Clarification of Answer by ragingacademic-ga on 09 Jun 2003 17:02 PDT
mcsemorgan -

Thanks for your additional request for clarification.

I am doing my best to help you out here, and I've pointed you to many
excellent sources that do exactly what you request thus far.  Have you
had a chance to review some of these ???

For example - 

Automated Online News Classification with Personalization 
Chan, Sun and Lim 
(from Google cache) 
 
http://216.239.33.100/search?q=cache:iZ5Tu-y-k68J:mandolin.cais.ntu.edu.sg/~sunaixin/paper/sun_icadl01.pdf+%22news+classification%22&hl=en&ie=UTF-8

Discusses precisely what you are looking for (as do countless other
articles I have referenced).
 
I'm not sure what you do for a living, or if you're still a student -
and therefore where your sphere of expertise lies - but HTML is merely
a *** markup language *** that tells a browser how to display text and
images on your monitor (or some other form factor).  HTML *** DOES NOT
*** and cannot automate, optimize or otherwise improve on the process
of news classification.  News clustering programs will typically be
written in C or Java and will run off of a server, and not on the
client-side - although perhaps some primitive news analysis and
classification can be done client-side using Javascript or Perl.

Here is yet another article that discusses exactly what you describe -

An Investigation of Linguistic Features and Clustering Algorithms
for Topical Document Clustering
Vasileios Hatzivassiloglou, Luis Gravano, Ankineedu Maganti
Department of Computer Science
Columbia University

http://www1.cs.columbia.edu/~gravano/Papers/2000/sigir00.pdf

But, again, note that none of this can be accomplished using HTML -
HTML can only be leveraged to display the results of such
categorizations using dynamically generated pages.

If you would like me to help you work on your idea, I could do a bit
more of that in this thread - but any significant additional research
would require a new thread.

thanks much and please let me know how I can further assist you -
ragingacademic

Request for Answer Clarification by mcsemorgan-ga on 10 Jun 2003 08:21 PDT
Your example "Automated Online News Classification with
Personalization" is quite relative with my requirement, but it is
using SVM. What I wanted is using HTML feature to make the result of
the classication more accurate. For example, I have read a paper which
is using HTML feature to make result of the classification more
accurate, and use the K-means algorithm to make the cluster process
more efficient. I would like you search some articles like this.

Clarification of Answer by ragingacademic-ga on 10 Jun 2003 11:16 PDT
mcsemorgan -

Hello again.  Here are some papers that integrate the following
concepts -
+ clustering
+ Web documents (news and other - it's all content, right?)
+ HTML
+ K-means or other similar algorithms.

I hope you will find this sufficient.


Concept Indexing - A Fast Dimensionality Reduction Algorithm with
Applications to Document Retrieval & Categorization
George Karypis and Eui-Hong (Sam) Han
University of Minnesota, Department of Computer Science / Army HPC
Research Center

"In recent years, we have seen a tremendous growth in the volume of
text documents available on the Internet, digital libraries, news
sources, and company-wide intranets. This has led to an increased
interest in developing methods that can efficiently categorize and
retrieve relevant information."

http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/ci.pdf


Collection of citations on partitioning-based clustering for the Web
with links to relevant articles -

http://citeseer.nj.nec.com/context/839195/9105


Correlation-based Document Clustering using Web Logs
Zhong Su, Qiang Yang, Hongjiang Zhang, Xiaowei Xu, Yuhen Hu

A problem facing information retrieval on the web is
how to effectively cluster large amounts of web documents.
One approach is to cluster the documents based on
information provided only by users’ usage logs and not
by the content of the documents. A major advantage of
this approach is that the relevancy information is
objectively reflected by the usage logs; frequent
simultaneous visits to two seemingly unrelated documents
should indicate that they are in fact closely related."

http://ifsc.ualr.edu/xwxu/publications/hicsscluster.pdf


Automatic Personalization Based on Web Usage Mining 
  
Bamshad Mobasher 
Dept. of Computer Science, DePaul University, Chicago, IL 
mobasher@cs.depaul.edu 

Robert Cooley, Jaideep Srivastava 
Dept. of Computer Science, University of Minnesota, Minneapolis, MN 
cooley@cs.umn.edu, srivasta@cs.umn.edu

http://maya.cs.depaul.edu/~mobasher/personalization/


Clustering Hypertext with Applications to Web Searching
Dharmendra Modha and Scott Spangler

"Clustering separates unrelated documents and groups related
documents,and is useful for discrimination,disambiguation,
summarization,organization,and navigation of unstructured
collections of hypertext documents.We propose a novel clus-
tering algorithm that clusters hypertext documents using words
(contained in the document),out-links (from the document),
and in-links (to the document).The algorithm automatically
determines the relative importance of words,out-links,and
in-links for a given collection of hypertext documents."

http://www.almaden.ibm.com/cs/people/dmodha/toric.pdf


Also, run the following search on Google (exactly as is below) -

"k-means algorithm" clustering Web news html .pdf

Will lead you to several dozen additional articles.

I hope this clarification now satisfies your request.

thanks,
ragingacademic
Reason this answer was rejected by mcsemorgan-ga:
The answer is not the thing that I wanted. He was work hard in
researching, but the references were not useful for me.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy