The following version of your "callback" sub will do the job
correctly. LWP::Parallel worked fine, but your code that tests
whether $response->base refers to the correct URL is not
quite correct when redirects are involved. According to
"perldoc HTTP::Response", $response->base might not contain
the original URL if this response object was created by a
redirect. Instead, follow the chain of response objects back
to the original response object, and test the URL associated
with that object; see my while() loop below.
Both of your examples work properly with this code. Enjoy.
* * *
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
#majortom-ga's change
my $oresponse = $response;
while (defined($oresponse->previous)) {
$oresponse = $oresponse->previous;
}
for ($i=0; $i<@urls; $i++)
{
# check $oresponse->base, not $response->base
if ( index( $oresponse->base, $urls[$i]) != -1 )
{
$results[$i].=$data;
last;
}
}
} |
Request for Answer Clarification by
marcfest-ga
on
07 Feb 2004 11:24 PST
Hi Major Tom -
When I run the script below which includes your changes, I still get no output.
Please advise.
#!/usr/bin/perl
# test.cgi
# all rights reserved by Marc Fest
# 2004
use LWP::Simple;
require LWP::Parallel::UserAgent;
#@urls = ("http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e");
@urls = ("http://www.onlinehb.com/q123e");
(@html) = grab (@urls);
print $html[0];
sub grab
{
@results= ();
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->timeout ($timeout);
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(6); # sets maximum number of locations accessed in parallel
$ua->max_req (2); # sets maximum number of parallel requests per host
foreach $url2 (@_)
{
$ua->register(HTTP::Request->new(GET => $url2), \&callback);
}
$ua->wait ();
return @results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
#majortom-ga's change
my $oresponse = $response;
while (defined($oresponse->previous)) {
$oresponse = $oresponse->previous;
}
for ($i=0; $i<@urls; $i++)
{
# check $oresponse->base, not $response->base
if ( index( $oresponse->base, $urls[$i]) != -1 )
{
$results[$i].=$data;
last;
}
}
}
|
Clarification of Answer by
majortom-ga
on
07 Feb 2004 11:51 PST
My apologies, there is one other minor change: you need to change
$ua->redirect(0); to $ua->redirect(1). Here is the complete working
version:
#!/usr/bin/perl
# test.cgi
# all rights reserved by Marc Fest
# 2004
use LWP::Simple;
require LWP::Parallel::UserAgent;
#@urls = ("http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e");
@urls = ("http://www.onlinehb.com/q123e");
(@html) = grab (@urls);
print $html[0];
sub grab
{
@results= ();
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->timeout ($timeout);
$ua->redirect (1); # majortom-ga: ALLOW automatic following of redirects
$ua->max_hosts(6); # sets maximum number of locations accessed in parallel
$ua->max_req (2); # sets maximum number of parallel requests per host
foreach $url2 (@_)
{
$ua->register(HTTP::Request->new(GET => $url2), \&callback);
}
$ua->wait ();
return @results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
#majortom-ga's change
my $oresponse = $response;
while (defined($oresponse->previous)) {
$oresponse = $oresponse->previous;
}
for ($i=0; $i<@urls; $i++)
{
# check $oresponse->base, not $response->base
if ( index( $oresponse->base, $urls[$i]) != -1 )
{
$results[$i].=$data;
last;
}
}
}
|
Request for Answer Clarification by
marcfest-ga
on
07 Feb 2004 12:48 PST
Your most recent version produces the following error when I run it:
Can't call method "request" on an undefined value at
/usr/lib/perl5/site_perl/5.6.1/LWP/UserAgent.pm line 528.
|
Clarification of Answer by
majortom-ga
on
07 Feb 2004 13:31 PST
I was going to tell you to upgrade to the latest version of
LWP::UserAgent, but just to be on the safe side, I did that myself to
check for surprises. And there was a very big surprise: when I
upgraded, I got the same error you did!
Naturally I assumed there had to be a reasonable explanation for this
and a straightforward workaround. But after much research, all I found
was a few bug reports and no fixes.
Finally, I fixed it myself. At line 1270 of the latest
LWP/Parallel/UserAgent.pm, you'll find this call:
return $response unless $self->redirect_ok($referral);
This is incorrect. The $response parameter is missing, and the latest version
of redirect_ok actually cares about that (which it didn't previously) and
fails miserably without it. This prevents LWP::Parallel from
successfully handling redirects.
I fixed it simply as shown:
return $response unless $self->redirect_ok($referral, $response);
Hopefully you have access to modify
/usr/lib/perl5/site_perl/5.6.1/LWP/Parallel/UserAgent.pm as shown
above (the version number of installation path might vary slightly for
you), otherwise I'd be glad to show you how to convince this one
program only to fetch a modified copy of LWP/Parallel/UserAgent.pm
from an alternate classpath in your own file space. If you have root
access on the machine that will not be necessary.
I have verified that the program I submitted as my previous answer
clarification does work both with and without a redirect taking place.
I will submit this bug report and fix to the LWP::Parallel
maintainer(s) today so that you do not face issues when upgrading via
CPAN.
Sorry for the inconvenience; I'll bet you didn't expect a bugfix to a
popular Perl module to be a necessary part of the answer. I'll be sure
to add a further comment if I hear anything interesting back from the
LWP::Parallel maintainers after submitting the fix.
|