Google Answers: Data mining news articles from RSS feeds

View Question

Q: Data mining news articles from RSS feeds ( No Answer, 0 Comments )

Question

Subject: Data mining news articles from RSS feeds
Category: Computers > Internet
Asked by: gmw3-ga
List Price: $100.00

Posted: 18 Apr 2006 20:24 PDT
Expires: 18 May 2006 20:24 PDT
Question ID: 720429

Is sofware available (free or for sale), to automatically fetch
content from RSS feeds on selected topices and store the stories on
the user's computer?  The RSS readers I've found only show lists of
short summaries with URL pointers which I have to click manually to
fetch the full story; and when the story is fetched, it isn't stored
on my PC, it is simply displayed.  So how do I effectively automate
the clicking and the storage of articles when retrieved on my PC? This
there an RSS reader that already does it?

Request for Question Clarification by leapinglizard-ga on 19 Apr 2006 22:43 PDT

NewsGator author Greg Reinacker discussed this subject a few years ago.

Greg Reinacker's Weblog: Aggregators that automatically download web pages
http://www.rassoc.com/gregr/weblog/archive.aspx?post=688

I don't know of any RSS reader that offers such a feature. Part of the
difficulty with the automatic "storage of articles," as you call it,
is that web articles are not presented in any fixed format. A web link
in an RSS item points to a complicated document that consists of text
interspersed with many HTML tags, including references to images and
to other pages. The task of downloading the page and a reasonable
portion of its context is called scraping or slurping, and is quite
distinct from the task of managing RSS feeds.

Given a link from an RSS item to a New York Times article, for
example, what would you want to have downloaded and stored on your
hard disk? Just the human-readable text of the article, or all of the
HTML markup? If you must have the full HTML, would you want every
graphical element decorating the page to be downloaded to your
computer as well? If the article consists of multiple pages, are you
content to have only the first page downloaded and stored, or do you
need an artificial-intelligence algorithm to find the remaining pages
and slurp those too?

leapinglizard

Answer

There is no answer at this time.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy