Google Answers Logo
View Question
Q: writing web spider ( Answered 3 out of 5 stars,   3 Comments )
Subject: writing web spider
Category: Computers > Software
Asked by: googley-ga
List Price: $5.00
Posted: 14 Jul 2002 19:01 PDT
Expires: 13 Aug 2002 19:01 PDT
Question ID: 39614
I am developing a web spider which will check for broken images(image
tags which don't have corrosponding images existing on the webserver)
on a web site.
I want the code in visual basic which will check for existence of
images on web server without creating instance of browser or any
browser control(since it takes time). I want some efficient code which
will send http request for a .jpg file and get response from server
which will not contain the file but specify whether the file exists or
not. The code should be complete with error handling routines.
Subject: Re: writing web spider
Answered By: wengland-ga on 15 Jul 2002 11:12 PDT
Rated:3 out of 5 stars

While I am not a VB Coder, I can provide you with a link to sample
code that will make your application do what you want.

The website has code and an article from their Spring 1998
"Getting Started with Visual Basic" magazine at:

The WinInet .dll file (provided by Microsoft) provides the calls to
directly query a web server to retrieve or check for the existence of
a document.  The WinInet DLL gives complete Internet functionality to
any VB app.

This .dll file provides a *ton* of connection methods and utilities. 
The suggested one to use is InternetOpenURL, which connects to a web
server and makes sure the file requested exists.  This should fulfil
your requirement.

The sample code provided in the article shows the exact steps to take
to make a connection and check for the existence of a file.

Sounds like a neat project; I hope you publish it when you are
finished.  I could use a tool like this.

Related Links

WinInet: Enable HTTP Communication in Windows-Based Client

VBinet.exe - samples of WinInet code in VB;EN-US;q185519

Using WinInet Asynchronously in VB;EN-US;q189850

Vbhttp.exe Demonstrates How to Use HTTP WinInet APIs in Visual Basic;EN-US;q259100

Search terms:

wininet @

visual basic (within results above) @

vb http library @
googley-ga rated this answer:3 out of 5 stars
thanks. Your answer is helpful to head in correct direction.
I already got it done using internet transfer control.

Subject: Re: writing web spider
From: philip_lynx-ga on 14 Jul 2002 19:44 PDT
And all that for $5? Unless there is open source for that (which I
doubt, as you specifically require VB), good luck!
Subject: Re: writing web spider
From: iaint-ga on 15 Jul 2002 03:20 PDT
My knowledge of Visual Basic is insufficient to allow me to give you
the answer you requested, but I can give you some tips which may help
you (or someone else) look in the right direction.

All you need to do is open a TCP/IP connection to your target
webserver and then use the HTTP "HEAD" request to determine whether or
not your required file exists. The format of the HEAD command (and the
rest of HTTP/1.1) is fully covered by RFC 2616 but in essence all you
will need to send to the server is the following three lines:

HEAD /path/to/target/file.jpg HTTP/1.1
Connection: Close

(followed by two CR/LF pairs)

You then need to capture the output from the server which will likely
consist of between 5-10 lines of text. If the requested file is
accessible the server should return a 'HTTP/1.1 200 OK' response as
its first line, if not you will most probably get 'HTTP/1.1 404'
(although if it exists but is not available for other reasons, other
statuses could occur. Consult the HTTP documentation for full

Most computer languages make it fairly easy to use TCP/IP sockets,
often with library files or modules which can make it almost as simple
as writing to a local file. A quick Google search:

has revealed, amongst many others, the site

which should give you some tips and source code examples that will
help you continue your software development.


HTTP 1.1 Specification:
Subject: Re: writing web spider
From: saulg-ga on 11 Sep 2002 07:46 PDT
Hi googley-ga 
I've been writing VB code since 1994 (VB2) and have previously written
a spider that downloaded some 25,000 pages automatically (took some 18

I believe that the program should first parse the links and then
attempt to fetch & report errors when resources are not available.

I Wouldn't mind having a go if you're still interested.


Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy