We are writing a web spider in c# using the .NET framework, version
1.1. It uses the HttpWebRequest class to fetch web pages. I noticed
that each time my program fetches a page, the memory of the
application increases. Having read other a few news posts online, I
can see that others have had similar issues, and I've done everything
I can to find the problem. But I haven't managed to solve it. Can
anyone help?
Things I have tried already include:
1) Ensuring that streams and webresponses are closed in finally blocks, as
soon as possible.
2) Ensuring that my own objects have a dispose method to do stuff like clear
big strings and references to other objects. I ensure these methods are
called as soon as the objects are no longer need.
3)I also confirmed that objects are getting destroyed as expected, by using
a profiler.
4) Setting AllowWriteStreamBufering to false on the WebRequest.
I've also tried some of the suggestions in the following thread, which
sounds like a similar problem:
http://groups.google.co.uk/groups?hl=en&lr=lang_en&c2coff=1&threadm=ef9gTVwLDHA.1720%40TK2MSFTNGP11.phx.gbl&rnum=1&prev=/groups%3Fq%3Dmemory%2520leak%2520threads%2520dotnet%26hl%3Den%26lr%3Dlang_en%26c2coff%3D1%26sa%3DN%26tab%3Dwg
These include:
- Using the MTA threading model.
- Calling GetTotalMemory(true)
Because memory grows each time a page is downloaded, I have to assume that
the collected data is somehow being stored somewhere. The program is failing
after 12 hours as it uses up ALL the memory! That memory just isn't getting
released.
I'm not the most most experienced with profilers, but my profiler
(http://www.scitech.se/memprofiler/Default.htm) seems to indicate a lot of
memory being consumed in "other data" etc, rather than in my own managed
area.
If I minimise and maximise the app (trimming the working set?), memory usage
drops to that which I would expect. Virtual Memory usage seems to stay
massive though. Shouldn't virtual memory eventually get flushed too?
My profiler is showing any big objects in memory that justify the
total memory used by the program.
I'm using multiple threads (not from the pool, but manually
instantiated, since I need to abort them)
I read that there is a bug with .NET 1.1 when it comes to releasing
memory back to the OS. Is this true?
If it helps, code for the fetch method of a "StandardPageFetcher"
class is below (this class stores no data other than a URI).
----- CODE ------
public string FetchPage(bool sendCookiesWithRequest)
{
if( _uri.IsFile )
{
using( StreamReader reader = new StreamReader( _uri.LocalPath ) )
{
return reader.ReadToEnd();
}
}
HttpWebRequest webRequest = WebRequest.Create( _uri ) as HttpWebRequest;
if( sendCookiesWithRequest )
{
//custom code to send a cookie,the problem still occurrs if this is removed
CookieContainer cookieContainer =
IeCookieFileReaderII.GetCookieContainerForUrl( _uri );
Log.Write( "INFO", "Searching for cookies for " +
_uri.Authority + ". found " + cookieContainer.Count.ToString());
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookieContainer;
}
webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1; SV1; .NET CLR 1.1.4322; Google-TR-1)";
webRequest.Accept = "*/*";
using( WebResponse resp = webRequest.GetResponse() )
{
using ( Stream s = resp.GetResponseStream() )
{
using (BufferedStream buff = new BufferedStream(s))
{
using( StreamReader reader = new StreamReader( buff ) )
{
return reader.ReadToEnd();
}
}
}
}
}
----- END CODE ------ |