|
|
Subject:
writing web spider
Category: Computers > Software Asked by: googley-ga List Price: $5.00 |
Posted:
14 Jul 2002 19:01 PDT
Expires: 13 Aug 2002 19:01 PDT Question ID: 39614 |
I am developing a web spider which will check for broken images(image tags which don't have corrosponding images existing on the webserver) on a web site. I want the code in visual basic which will check for existence of images on web server without creating instance of browser or any browser control(since it takes time). I want some efficient code which will send http request for a .jpg file and get response from server which will not contain the file but specify whether the file exists or not. The code should be complete with error handling routines. |
|
Subject:
Re: writing web spider
Answered By: wengland-ga on 15 Jul 2002 11:12 PDT Rated: |
Greetings! While I am not a VB Coder, I can provide you with a link to sample code that will make your application do what you want. The DevX.com website has code and an article from their Spring 1998 "Getting Started with Visual Basic" magazine at: http://www.devx.com/free/codelib/view.asp?id=342155 The WinInet .dll file (provided by Microsoft) provides the calls to directly query a web server to retrieve or check for the existence of a document. The WinInet DLL gives complete Internet functionality to any VB app. This .dll file provides a *ton* of connection methods and utilities. The suggested one to use is InternetOpenURL, which connects to a web server and makes sure the file requested exists. This should fulfil your requirement. The sample code provided in the article shows the exact steps to take to make a connection and check for the existence of a file. Sounds like a neat project; I hope you publish it when you are finished. I could use a tool like this. Related Links WinInet: Enable HTTP Communication in Windows-Based Client Applications http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnmag01/html/USEMON.asp VBinet.exe - samples of WinInet code in VB http://support.microsoft.com/default.aspx?scid=kb;EN-US;q185519 Using WinInet Asynchronously in VB http://support.microsoft.com/default.aspx?scid=kb;EN-US;q189850 Vbhttp.exe Demonstrates How to Use HTTP WinInet APIs in Visual Basic http://support.microsoft.com/default.aspx?scid=kb;EN-US;q259100 Search terms: wininet @ microsoft.com http://search.microsoft.com/default.asp?boolean=ALL&nq=NEW&so=RECCNT&ig=01&ig=02&ig=03&ig=04&ig=05&ig=06&ig=07&ig=08&ig=09&ig=10&i=00&i=01&i=02&i=03&i=04&i=05&i=06&i=07&i=08&i=09&qu=wininet visual basic (within results above) @ microsoft.com http://search.microsoft.com/Default.asp?so=RECCNT&boolean=ALL&siteid=us&p=1&nq=WITHIN&fqu=%2522WININET%2522&qu=wininet&qu=visual+basic&nso=RECCNT&ig=1&ig=2&ig=3&ig=4&ig=5&ig=6&ig=7&ig=8&ig=9&ig=10&i=00&i=01&i=02&i=03&i=04&i=05&i=06&i=07&i=08&i=09 vb http library @ google.com ://www.google.com/search?q=vb+http+library |
googley-ga
rated this answer:
thanks. Your answer is helpful to head in correct direction. I already got it done using internet transfer control. |
|
Subject:
Re: writing web spider
From: philip_lynx-ga on 14 Jul 2002 19:44 PDT |
And all that for $5? Unless there is open source for that (which I doubt, as you specifically require VB), good luck! |
Subject:
Re: writing web spider
From: iaint-ga on 15 Jul 2002 03:20 PDT |
My knowledge of Visual Basic is insufficient to allow me to give you the answer you requested, but I can give you some tips which may help you (or someone else) look in the right direction. All you need to do is open a TCP/IP connection to your target webserver and then use the HTTP "HEAD" request to determine whether or not your required file exists. The format of the HEAD command (and the rest of HTTP/1.1) is fully covered by RFC 2616 but in essence all you will need to send to the server is the following three lines: HEAD /path/to/target/file.jpg HTTP/1.1 Host: www.webservername.com Connection: Close (followed by two CR/LF pairs) You then need to capture the output from the server which will likely consist of between 5-10 lines of text. If the requested file is accessible the server should return a 'HTTP/1.1 200 OK' response as its first line, if not you will most probably get 'HTTP/1.1 404' (although if it exists but is not available for other reasons, other statuses could occur. Consult the HTTP documentation for full details). Most computer languages make it fairly easy to use TCP/IP sockets, often with library files or modules which can make it almost as simple as writing to a local file. A quick Google search: ://www.google.co.uk/search?q=visual+basic+tcp/ip+socket+open has revealed, amongst many others, the site http://www.15seconds.com/issue/990408.htm which should give you some tips and source code examples that will help you continue your software development. Regards iaint-ga HTTP 1.1 Specification: http://www.ietf.org/rfc/rfc2616.txt |
Subject:
Re: writing web spider
From: saulg-ga on 11 Sep 2002 07:46 PDT |
Hi googley-ga I've been writing VB code since 1994 (VB2) and have previously written a spider that downloaded some 25,000 pages automatically (took some 18 hours!) I believe that the program should first parse the links and then attempt to fetch & report errors when resources are not available. I Wouldn't mind having a go if you're still interested. saulg-ga |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |