Hi futures_trade-ga,
I was out of town when you posted this question and I'm afraid I
missed its posting on Google Answers. I just noticed it yesterday
while I was browsing through the older questions. My apologies for the
delay.
The solution for this question builds upon the answer to your
previous question. I have updated the sample application I built for
the previous question to now read from Internet Explorer Windows as
well. You can download the updated sample app from: [
http://rapidshare.de/files/34445128/GA-ScreenScraper.zip.html ]
The sample app contains two new buttons. One to retrieve the text
from an IE window, and the other to retrieve the underlying HTML code.
When you press any of the buttons, the sample app will launch
'news.google.com' in a new IE window - wait 3 seconds for the website
to load - and then display the text or the html for the page in a
textbox.
The functionality to read IE text is provided by three new methods
in the ScreenScraper class. They are listed below:
===================================================================
// Gets the Internet Explorer IHTMLDocument2 object for the given
// IE Server control window handle
public IHTMLDocument2 GetIEDocumentFromWindowHandle(IntPtr hWnd)
{
UIntPtr lResult;
uint lMsg;
IHTMLDocument2 htmlDocument=null;
if (hWnd != IntPtr.Zero)
{
// Register the WM_HTML_GETOBJECT message so it can be used
// to communicate with the Internet Explorer instance
lMsg = Win32.RegisterWindowMessage("WM_HTML_GETOBJECT");
// Sends the above registered message to the IE window and
// waits for it to process it
Win32.SendMessageTimeout (hWnd, lMsg, UIntPtr.Zero, UIntPtr.Zero,
Win32.SendMessageTimeoutFlags.SMTO_ABORTIFHUNG, 1000, out lResult);
if (lResult != UIntPtr.Zero)
{
// Casts the value returned by the IE window into
//an IHTMLDocument2 interface
htmlDocument = Win32.ObjectFromLresult(lResult,
typeof(IHTMLDocument).GUID, IntPtr.Zero) as IHTMLDocument2;
if (htmlDocument == null)
{
throw new COMException("Unable to cast to an object of
type IHTMLDocument");
}
}
}
return htmlDocument;
}
private string ScrapeIEHtmlContent(IntPtr handle)
{
IHTMLDocument2 htmlDoc = GetIEDocumentFromWindowHandle(handle);
return htmlDoc.body.innerHTML;
}
private string ScrapeIETextContent(IntPtr handle)
{
IHTMLDocument2 htmlDoc = GetIEDocumentFromWindowHandle(handle);
return htmlDoc.body.innerText;
}
==============================================================
The most important of the three new methods is
GetIEDocumentFromWindowHandle. This method, given the handle to the
'Internet Explorer_Server' window, retrieves an object implementing
the IHtmlDocument2 interface from it. We can then use this object to
retrieve the text or the html from the page body, which is what the
other two new methods 'ScrapeIEHtmlContent' and 'ScrapeIETextContent'
do.
Note that since we now use the IHTMLDocument2 interface, you will have
to add a reference to 'Microsoft.Mshtml' library to the project. This
library will already be available on your system.
The code requires the use of three new Win32 methods. They are:
1. RegisterWindowMessage
[http://windowssdk.msdn.microsoft.com/en-us/library/ms644947.aspx]
2. SendMessageTimeout
[http://windowssdk.msdn.microsoft.com/en-us/library/ms644952.aspx]
3. ObjectFromLresult
[http://windowssdk.msdn.microsoft.com/en-us/library/ms697301.aspx]
Related Articles:
=================
- Protect your IM (Instant Messenger) conversations by encrypting them
[http://www.codeproject.com/csharp/imencryptor.asp]
- Retrieving Conversations from Yahoo Messenger
[http://www.codeproject.com/cpp/yahoochattext.asp]
---------------------------------------------------------------------------
Hope this helps!
If you need any clarifications, just ask.
Regards,
Theta-ga
:)
======================================================================
Google Search Terms Used:
WM_HTML_GETOBJECT c#
WM_HTML_GETOBJECT ObjectFromLresult c# |