Request for Question Clarification by
leapinglizard-ga
on
19 Apr 2006 22:43 PDT
NewsGator author Greg Reinacker discussed this subject a few years ago.
Greg Reinacker's Weblog: Aggregators that automatically download web pages
http://www.rassoc.com/gregr/weblog/archive.aspx?post=688
I don't know of any RSS reader that offers such a feature. Part of the
difficulty with the automatic "storage of articles," as you call it,
is that web articles are not presented in any fixed format. A web link
in an RSS item points to a complicated document that consists of text
interspersed with many HTML tags, including references to images and
to other pages. The task of downloading the page and a reasonable
portion of its context is called scraping or slurping, and is quite
distinct from the task of managing RSS feeds.
Given a link from an RSS item to a New York Times article, for
example, what would you want to have downloaded and stored on your
hard disk? Just the human-readable text of the article, or all of the
HTML markup? If you must have the full HTML, would you want every
graphical element decorating the page to be downloaded to your
computer as well? If the article consists of multiple pages, are you
content to have only the first page downloaded and stored, or do you
need an artificial-intelligence algorithm to find the remaining pages
and slurp those too?
leapinglizard