Google Answers Logo
View Question
 
Q: Information about Web Wrappers and Crawlers. ( No Answer,   2 Comments )
Question  
Subject: Information about Web Wrappers and Crawlers.
Category: Computers > Algorithms
Asked by: fabrirs-ga
List Price: $5.00
Posted: 20 Oct 2002 16:42 PDT
Expires: 19 Nov 2002 15:42 PST
Question ID: 85630
I'm doing a research about Web Wrappers and Crawlers. Who know some
links that could help me on my research? I'm looking for papers that
show the
state of art on this field, and expecificaly about the deep web, or
hidden web. Probabily using XML, or not. And sites that shows possible
applications that we could do with this tecnology.
I'd like to know more about the tecnology too. Thanks.

Request for Question Clarification by sgtcory-ga on 20 Oct 2002 18:14 PDT
Hello fabrirs,

When you refer to XML, are you saying you wish to see information
about crawlers/wrapper examples that retrieve data via indexing, and
that data is then stored in an XML frameset for later use?

Thank you -

SgtCory

Clarification of Question by fabrirs-ga on 21 Oct 2002 10:59 PDT
On my previous researches about crawlers/wrappers i see some examples
that uses XML and XSLT to navigate on the pages, following links, see:

[1] S. Raghavan and H. Garcia-Molina. Crawling the
hidden web. Technical Report 2000-36, Computer
Science Department, Stanford University, December
2000. Available at
http://dbpubs.stanford.edu/pub/2000-36

[2] Effective Web Data Extraction with Standart XML Technologies.
http://www10.org/cdrom/papers/102/

This papers demonstrates XML how the better way to extract structured
data from web data sources, but was publicated at 2000. I'd like to
discovery the most recent papers on this area (2002). But I don't know
if XML continue are the best alternative. What i'm looking forward is
the most recent technics to construct crawlers/wrappers. I hope you
can help me.

Thanks for your interest.
Bye.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Information about Web Wrappers and Crawlers.
From: sfboy-ga on 01 Nov 2002 23:27 PST
 
The following research report describes the architecture of
a high performance web crawler (currently used by the Alta
Vista search engine), as well as some of the technical issues
faced by web crawlers in general:

High Performance Web Crawling
Marc Najork and Allan Heydon
ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/SRC-173.pdf

You may also be interested in WebL, a web scripting language,
which can be used to (much less efficiently) crawl the web
and to easily process the results:

WebL Home Page
Hannes Marais
http://research.compaq.com/SRC/WebL/

Good luck!
Subject: Re: Information about Web Wrappers and Crawlers.
From: fabrirs-ga on 07 Nov 2002 05:12 PST
 
Thanks, the link that you send were very useful for my research. But I
still need more information mainly about wrappers. The searching
continue. Bye.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy