Google Answers: VisualBasic.Net Web Browser control

View Question

Q: VisualBasic.Net Web Browser control ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: VisualBasic.Net Web Browser control
Category: Computers > Programming
Asked by: snowman5000-ga
List Price: $50.00

Posted: 05 Dec 2002 15:48 PST
Expires: 04 Jan 2003 15:48 PST
Question ID: 120006

I'm working with web browser control in VisualBasic.Net. I want to
know how to automatically extract the source code from a webpage and
load it into a variable so my program can conduct analysis on it. What
code can I use to achieve this?

Answer

Subject: Re: VisualBasic.Net Web Browser control
Answered By: mathtalk-ga on 06 Dec 2002 09:42 PST
Rated: 5 out of 5 stars

Hi, snowman5000-ga: Since you are already working in VB.Net with what I assume is the "Web browser" COM component (from shdocvw.dll), I will focus on the question of how to extract "source code" from a Web pages in it. By this I assume you mean the HTML of a typical Web page. As you probably know, while the .Net environment "wraps" COM components with "interop" interfaces that mirror the methods and properties of the underlying COM objects, there is no native .Net "Web browser control" per se. For a discussion of this see the thread at: [Web Browser Control for DotNet?] http://developersdex.com/vb/message.asp?p=1120&r=2339929 On the other hand there are some classes "native" to the .Net framework: System.Net.HttpWebRequest System.Net.HttpWebResponse which would suffice for submitting URLs to the Web and returning the text of the HTTP responses. For example, this project does that sort of thing in VB.Net: http://www.c-sharpcorner.com/vbnet/httpdwnloader.asp But let's suppose you already have a simple Web browser application like the one described here: [Web Browser in C# and VB.Net] http://www.c-sharpcorner.com/Internet/WebBrowserInCSMDB.asp Perhaps you are thinking of adding a button that "extracts" the HTML text of the current page? This could allow you to navigate around with links and then extract the source of selected pages interactively. In any case we should look at the Document property of the Web browser control, which returns (the automation object of) the active document. When this active document is an HTML page, per your question, the object returned is of type HTMLDocument. The Web browser control can "contain" other types of documents, such as Word, Acrobat, etc. So it might be useful to know that the Web browser control's Type property returns a string which identifies the type of document object that it contains. By default, if you simply drag and drop a Web browser control from the toolbox onto a form in your VB.Net project, it winds up being named axWebBrowser1. Let's assume that to be your case here. Note that the HTMLDocument is a completely different automation object from the Web browser control itself. The HTMLDocument interface is provided by mshtml.dll, and to use this in your project you will need to right-click on the References folder and select Add Reference. Go to the .Net tab of the Add Reference dialog box and double click the component named Microsoft.mshtml. Click OK. The text of the HTML document can now be obtained this way: Dim HTMLBody as String Dim HTMLDoc as mshtml.HTMLDocument HTMLDoc = AxWebBrowser1.Document HTMLBody = HTMLDoc.body.outerHTML To test this approach I created a new VB.Net "Windows application" project called DOMDemo, added the Web browser control, a label "URL", a textbox (to enter URLs), and a "Go" button. With the addition only of the following code as the handler for clicks on Button1: Cursor.Current = Cursors.WaitCursor AxWebBrowser1.Navigate(TextBox1.Text) Cursor.Current = Cursors.Default Voila, a working Web browser! I then added a second button ("Extract") to the form, and the following code as handler for clicks on Button2: Dim HTMLBody, HTMLTrunc As String Dim HTMLDoc As mshtml.HTMLDocument HTMLDoc = AxWebBrowser1.Document HTMLBody = HTMLDoc.body.outerHTML HTMLTrunc = Mid(HTMLBody, 1, 100) MsgBox(HTMLTrunc) Here I'm truncating the HTML body down to a hundred characters just because there's a limit of about a thousand characters. I set a debug breakpoint here anyway so I could see whatever might be of interest. Obviously there could be various things you might want to change about this, depending on the sort of analysis you plan on doing. In particular I'd point out the collections HTMLDocument.anchors and HTMLDocument.links that might expedite your analysis, if you were interested in checking links between pages. If you replace outerHTML by outerText, then one sees plain text (without tags). If you replace outerHTML by innerHTML, etc. one gets content from between the outer tags. More references that may be of interest: [MSDN Library MSHTML Reference] http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp [Mastering IE: The Web Browser Control] (an introductory VB6 take, slightly disorganised) http://www.vbwm.com/art_2001/IE05/ [Accessing the DHTML DOM from C#] (ok, it's got C# code, but very relevant anyway) http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vsgrfwalkthroughaccessingdhtmldomfromc.asp [Trapping DHTML events from the WebBrowser control] (deals with duplicate interface names, not crucial above) http://www.vb2themax.com/Item.asp?PageID=TipBank&ID=561 regards, mathtalk-ga Search Strategy: Keywords: "VB.Net" "Web browser control" ://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%22VB.Net%22+%22Web+browser+control%22&btnG=Google+Search Keywords: VB "mshtml.htmldocument" ://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=VB+%22mshtml.htmldocument%22&btnG=Google+Search
Clarification of Answer by mathtalk-ga on 06 Dec 2002 09:45 PST Oops! When I said there's a limit of about a thousand characters, I did not make clear that I meant the limit of what MsgBox function can display. There is no such limit in the DHTML DOM implementation, or in the Web browser itself. regards, mathtalk-ga
Clarification of Answer by mathtalk-ga on 06 Dec 2002 16:09 PST Thanks, snowman5000, for the kind words (and tip!). After working on your interesting question, the feedback means a lot to me. best wishes, mathtalk-ga
Clarification of Answer by mathtalk-ga on 09 Dec 2002 20:52 PST Hi, snowman5000-ga: While researching your question, I tended to ignore sites that require user registration, even if it is free to do so, because in some cases it might turn out to be "spam bait". However this site is operated by Wrox Press, and I've been registered there for a long time & have a lot of respect for them. I've never had any suspicion they might be sharing my email address with third parties, and IIRC their TOS promise not to do this. So you might want to look here (this is a free article but you'll have to register at the site to read it): [Programming Internet Explorer in C#] http://www.csharptoday.com/content.asp?id=1980&csharp0161 from the abstract: "You can also consider this case study to be about COM to .NET interoperability." My perception is that 9 times out of 10 the translation of C# lines to VB.Net lines is fairly obvious (at least after someone shows it to you!), so if you are pursuing this IE topic further, the above article may be helpful. regards and thanks again, mathtalk-ga

snowman5000-ga rated this answer: 5 out of 5 stars

and gave an additional tip of: $5.00

This answer was exactly what I was looking for. This will be very
useful to me. Much Thanks.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy