Dear mcsemorgan,
Thanks for your question. First, let me request that if any of the
following is unclear or if you require any further research please
dont hesitate to ask me for a clarification.
You requested references related to automated news classification on
the Web.
Ive provided some references below they are all available on the
Web. Are you also interested in references that you may be able to
access through your library?
Please review the following references and let me know if we are in
the right direction or not will be glad to provide you with
additional references to help you complete your project successfully
as needed.
Bikini: User Adaptive News Classification in the World Wide Web
Dorfer, Eilurt, Mentrup, Muller, Rolf, Rollinger, Sievertsen, Trenkamp
Institute for Semantic Information Processing, University of
Osnabruck, Germany
http://citeseer.nj.nec.com/cache/papers/cs/22129/http:zSzzSzwww.dfki.dezSz~raferzSzum01-ml4um-wszSzpaperszSzMM-etal.pdf/bikini-user-adaptive-news.pdf
Extracting Semistructured Information from the Web (1997)
J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo
Proceedings of the Workshop on Management of Semistructured Data
We describe a configurable tool for extracting semistructured data
from a set of HTML pages and for converting the extracted information
into database objects.
http://citeseer.nj.nec.com/cache/papers/cs/1719/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSzextract.pdf/hammer97extracting.pdf
Knowledge Discovery for Automatic Query Expansion on the World Wide
Web
Mathias Géry, M. Hatem Haddad
The World-Wide Web is an enormous, distributed, and heterogeneous
information space. Currently, with the growth of available data,
finding interesting information is difficult. Search engines like
AltaVista are useful, but their results are not always satisfactory.
In this paper, we present a method called Knowledge Discovery on the
Web for extracting connections between terms.
http://citeseer.nj.nec.com/cachedpage/508826/1
User Modeling for Adaptive News Access
DANIEL BILLSUS and MICHAEL J. PAZZANI
We present a framework for adaptive news access, based on machine
learning techniques specially designed for this task. First, we focus
on the system's general functionality and system architecture. We then
describe the interface and design of two deployed news agents that are
part of the described architecture.
http://citeseer.nj.nec.com/cache/papers/cs/27103/http:zSzzSzwww.ics.uci.eduzSz~pazzanizSzPublicationszSzBillsusA.pdf/billsus00user.pdf
A Learning Agent for Wireless News Access (2000)
Daniel Billsus, Michael J. Pazzani, James Chen
Intelligent User Interfaces
We describe a user interface for wireless information devices,
specifically designed to facilitate learning about users' individual
interests in daily news stories. User feedback is collected
unobtrusively to form the basis for a content-based machine-learning
algorithm. As a result, the described system can adapt to users'
individual interests, reduce the amount of information that needs to
be transmitted, and help users access relevant information with
minimal effort.
http://citeseer.nj.nec.com/cache/papers/cs/15634/http:zSzzSzlieber.www.media.mit.eduzSzpeoplezSzlieberzSzIUIzSzBillsuszSzBillsus.pdf/billsus00learning.pdf
A Personal News Agent that Talks, Learns and Explains (1999)
Daniel Billsus and Michael J. Pazzani
Proceedings of the Third International Conference on Autonomous Agents
(Agents'99)
Most work on intelligent information agents has thus far focused on
systems that are accessible through the World Wide Web. As demanding
schedules prohibit people from continuous access to their computers,
there is a clear demand for information systems that do not require
workstation access or graphical user interfaces. We present a personal
news agent that is designed to become part of an intelligent,
IP-enabled radio, which uses synthesized speech to read news stories
to a user.
http://citeseer.nj.nec.com/cache/papers/cs/5681/http:zSzzSzwww.ics.uci.eduzSz~dbillsuszSzpaperszSzagents99-news.pdf/billsus99personal.pdf
A Hybrid User Model for News Story Classification (1999
Daniel Billsus, Michael J. Pazzani
We present an intelligent agent designed to compile a daily news
program for individual users. Based on feedback from the user, the
system automatically adapts to the user's preferences and interests.
In this paper we focus on the system's user modeling component.
http://citeseer.nj.nec.com/cache/papers/cs/9177/http:zSzzSzwww.ics.uci.eduzSz~dbillsuszSzpaperszSzum99.pdf/billsus99hybrid.pdf
Automated Online News Classification with Personalization
Chan, Sun and Lim
(from Google cache)
http://216.239.33.100/search?q=cache:iZ5Tu-y-k68J:mandolin.cais.ntu.edu.sg/~sunaixin/paper/sun_icadl01.pdf+%22news+classification%22&hl=en&ie=UTF-8
Self-organizing classification on the Reuters news corpus
Stefan Wermter
The Informatics Centre
School of CET
University of Sunderland
St. Peters Way, Sunderland SR6 0DD
United Kingdom
Stefan.wermter@sunderland.ac.uk
Chihli Hung 1
The Informatics Centre
School of CET
University of Sunderland
St. Peters Way, Sunderland SR6 0DD
United Kingdom
Chihli.hung@sunderland.ac.uk
http://www.his.sunderland.ac.uk/ps/coling-232.pdf
Using Structured Self-Organizing Maps in News Integration Websites
Ivan Perelomov, Arnulfo P. Azcarraga, Jonathan Tan and Tat Seng Chua
PRIS Group, School of Computing, National University of Singapore,
Singapore 117543
http://www2002.org/CDROM/poster/105.pdf
Theme-based Retrieval of Web News
Nuno Maria, Mário J. Silva
DI/FCUL
Faculdade de Ciências
Universidade de Lisboa
Campo Grande, Lisboa
Portugal
http://xldb.fc.ul.pt/ariadne/documentos/p64webdb2000.pdf
Modular Preference Moore Machines in News Mining Agents
Stefan Wermter and Garen Arevian
University of Sunderland
The Informatics Centre, SCET
St. Peters Campus, St Peters Way
Sunderland SR6 0DD, United Kingdom
http://www.his.sunderland.ac.uk/ps/nafips5.pdf
I hope this response adequately addresses your request. Please let me
know if you are in need of additional information concerning this
query.
Thanks,
ragingacademic-ga
Search Strategy:
"news classification" |
Clarification of Answer by
ragingacademic-ga
on
09 Jun 2003 17:02 PDT
mcsemorgan -
Thanks for your additional request for clarification.
I am doing my best to help you out here, and I've pointed you to many
excellent sources that do exactly what you request thus far. Have you
had a chance to review some of these ???
For example -
Automated Online News Classification with Personalization
Chan, Sun and Lim
(from Google cache)
http://216.239.33.100/search?q=cache:iZ5Tu-y-k68J:mandolin.cais.ntu.edu.sg/~sunaixin/paper/sun_icadl01.pdf+%22news+classification%22&hl=en&ie=UTF-8
Discusses precisely what you are looking for (as do countless other
articles I have referenced).
I'm not sure what you do for a living, or if you're still a student -
and therefore where your sphere of expertise lies - but HTML is merely
a *** markup language *** that tells a browser how to display text and
images on your monitor (or some other form factor). HTML *** DOES NOT
*** and cannot automate, optimize or otherwise improve on the process
of news classification. News clustering programs will typically be
written in C or Java and will run off of a server, and not on the
client-side - although perhaps some primitive news analysis and
classification can be done client-side using Javascript or Perl.
Here is yet another article that discusses exactly what you describe -
An Investigation of Linguistic Features and Clustering Algorithms
for Topical Document Clustering
Vasileios Hatzivassiloglou, Luis Gravano, Ankineedu Maganti
Department of Computer Science
Columbia University
http://www1.cs.columbia.edu/~gravano/Papers/2000/sigir00.pdf
But, again, note that none of this can be accomplished using HTML -
HTML can only be leveraged to display the results of such
categorizations using dynamically generated pages.
If you would like me to help you work on your idea, I could do a bit
more of that in this thread - but any significant additional research
would require a new thread.
thanks much and please let me know how I can further assist you -
ragingacademic
|
Clarification of Answer by
ragingacademic-ga
on
10 Jun 2003 11:16 PDT
mcsemorgan -
Hello again. Here are some papers that integrate the following
concepts -
+ clustering
+ Web documents (news and other - it's all content, right?)
+ HTML
+ K-means or other similar algorithms.
I hope you will find this sufficient.
Concept Indexing - A Fast Dimensionality Reduction Algorithm with
Applications to Document Retrieval & Categorization
George Karypis and Eui-Hong (Sam) Han
University of Minnesota, Department of Computer Science / Army HPC
Research Center
"In recent years, we have seen a tremendous growth in the volume of
text documents available on the Internet, digital libraries, news
sources, and company-wide intranets. This has led to an increased
interest in developing methods that can efficiently categorize and
retrieve relevant information."
http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/ci.pdf
Collection of citations on partitioning-based clustering for the Web
with links to relevant articles -
http://citeseer.nj.nec.com/context/839195/9105
Correlation-based Document Clustering using Web Logs
Zhong Su, Qiang Yang, Hongjiang Zhang, Xiaowei Xu, Yuhen Hu
A problem facing information retrieval on the web is
how to effectively cluster large amounts of web documents.
One approach is to cluster the documents based on
information provided only by users usage logs and not
by the content of the documents. A major advantage of
this approach is that the relevancy information is
objectively reflected by the usage logs; frequent
simultaneous visits to two seemingly unrelated documents
should indicate that they are in fact closely related."
http://ifsc.ualr.edu/xwxu/publications/hicsscluster.pdf
Automatic Personalization Based on Web Usage Mining
Bamshad Mobasher
Dept. of Computer Science, DePaul University, Chicago, IL
mobasher@cs.depaul.edu
Robert Cooley, Jaideep Srivastava
Dept. of Computer Science, University of Minnesota, Minneapolis, MN
cooley@cs.umn.edu, srivasta@cs.umn.edu
http://maya.cs.depaul.edu/~mobasher/personalization/
Clustering Hypertext with Applications to Web Searching
Dharmendra Modha and Scott Spangler
"Clustering separates unrelated documents and groups related
documents,and is useful for discrimination,disambiguation,
summarization,organization,and navigation of unstructured
collections of hypertext documents.We propose a novel clus-
tering algorithm that clusters hypertext documents using words
(contained in the document),out-links (from the document),
and in-links (to the document).The algorithm automatically
determines the relative importance of words,out-links,and
in-links for a given collection of hypertext documents."
http://www.almaden.ibm.com/cs/people/dmodha/toric.pdf
Also, run the following search on Google (exactly as is below) -
"k-means algorithm" clustering Web news html .pdf
Will lead you to several dozen additional articles.
I hope this clarification now satisfies your request.
thanks,
ragingacademic
|