I am using ColdFusion 5.0 to build a link-checking spider; it actually
works very well so far almost all the time. Out of approximately
40,000 links it has checked so far, it has only choked on this one:
http://www.indexsearch.co.uk/websearch/page+space+web+NASA/
That page appears to output an infinite loop, so CFHTTP gets started
but then never stops. Apparently the TimeOut function of CFHTTP
applies only to the time spent trying to access a URL before giving
up, but once that access has begun the TimeOut no longer applies.
This makes a certain kind of sense because there's no telling how big
a file might be that you might want to call with CFHTTP, but it causes
a problem with this particular URL which returns infintite data. It's
only a minor problem but needs fixing to make my link-checker
bulletproof so that I can run it unattended.
Any method that can solve this problem for practical use will be
considered as a valid answer to this question. Contacting the site
owner to have him fix his infinite output is not a solution to this
problem because it will just happen again when any other infinite URL
is encountered. The point is to make the linkchecking application
bulletproof, not to get past this one URL because that issue is
already taken care of. A satisfactory answer will be in the form of
specific code which actually solves the problem, not general advice or
recommendations about what technologies might possibly do so.
<CFSET StartTime = GetTickCount()>
<CFSET TimeEnd = StartTime + 4000> <!-- I'd like to set a max time
of 4 seconds --->
<CFTRY>
<CFHTTP URL="http://www.indexsearch.co.uk/websearch/page+space+web+NASA/"
METHOD="GET"
ResolveURL="0"
TimeOut="2"
Redirect="NO"
ThrowOnError="YES">
<CFIF #Variables.TimeEnd# LESS THAN OR EQUAL TO #GetTickCount()#>
<CFTHROW MESSAGE = "#Variables.TimeEnd# WAS LESS THAN OR
EQUAL TO #GetTickCount()#">
</CFIF>
</CFHTTP>
<CFSET Ping = GetTickCount() - StartTime>
<CFCATCH TYPE="Any">
<!---
Note: none of these 'advanced' types will work when
substituted for "Any" above):
COM.Allaire.ColdFusion.HTTPCFHTTPRequestEntityTooLarge
COM.Allaire.ColdFusion.HTTPRequestURITooLarge
COM.Allaire.ColdFusion.Request.Timeout
COM.Allaire.ColdFusion.HTTPConnectionTimeout
COM.Allaire.ColdFusion.HTTPFileNotRenderable
COM.Allaire.ColdFusion.HTTPGatewayTimeout
COM.Allaire.ColdFusion.HTTPNotAcceptable
--->
An error occurred
</CFCATCH>
</CFTRY> |