Google Answers Logo
View Question
 
Q: Perl: Parallel User Agent question ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Perl: Parallel User Agent question
Category: Computers > Programming
Asked by: marcfest-ga
List Price: $20.00
Posted: 06 Feb 2004 06:04 PST
Expires: 07 Mar 2004 06:04 PST
Question ID: 304089
Hello - 

http://www.onlinehb.com/q123e (url A) redirects to
http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e (url B)

The Web page fetching script below (using Parallel User Agent) can't
seem to hande this redirect. It works with "url A", but not with "url
B" (see lines 9 and 12). I thought I'd be able to fix this by changing
"$ua->redirect (0)" to "$ua->redirect (1)" in the grab routine, but
doing so causes an error.

As an answer to this question, please post a revised version of the
code below that will work with both url A and url B. The script needs
to keep using parallel user agent and keep using the "grab" function.

Thank you for your help.

Marc



#!/usr/bin/perl
# test.cgi
# all rights reserved by Marc Fest
# 2004

use LWP::Simple;
require LWP::Parallel::UserAgent;

#line A
#@urls = ("http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e");

#line B
@urls = ("http://www.onlinehb.com/q123e");

(@html) = grab (@urls);
print $html[0];



sub grab
{
   @results= ();

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->timeout ($timeout);
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(6); # sets maximum number of locations accessed in parallel
   $ua->max_req  (2); # sets maximum number of parallel requests per host

  foreach $url2 (@_)
  {
       $ua->register(HTTP::Request->new(GET => $url2), \&callback);
  }

  $ua->wait ();
  return @results;

}

sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        for ($i=0; $i<@urls; $i++)
        {
                if ( index( $response->base, $urls[$i]) != -1 )
                {
                        $results[$i].=$data;
                        last;
                }
        }
}
Answer  
Subject: Re: Perl: Parallel User Agent question
Answered By: majortom-ga on 07 Feb 2004 09:38 PST
Rated:5 out of 5 stars
 
The following version of your "callback" sub will do the job
correctly. LWP::Parallel worked fine, but your code that tests
whether $response->base refers to the correct URL is not
quite correct when redirects are involved. According to 
"perldoc HTTP::Response", $response->base might not contain
the original URL if this response object was created by a
redirect. Instead, follow the chain of response objects back
to the original response object, and test the URL associated
with that object; see my while() loop below. 

Both of your examples work properly with this code. Enjoy.

* * * 

sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        #majortom-ga's change
        my $oresponse = $response;
        while (defined($oresponse->previous)) {
                $oresponse = $oresponse->previous;
        }
        for ($i=0; $i<@urls; $i++)
        {
                # check $oresponse->base, not $response->base
                if ( index( $oresponse->base, $urls[$i]) != -1 )
                {
                        $results[$i].=$data;
                        last;
                }
        }
}

Request for Answer Clarification by marcfest-ga on 07 Feb 2004 11:24 PST
Hi Major Tom - 

When I run the script below which includes your changes, I still get no output. 

Please advise.

#!/usr/bin/perl
# test.cgi
# all rights reserved by Marc Fest
# 2004


use LWP::Simple;
require LWP::Parallel::UserAgent;

#@urls = ("http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e");
@urls = ("http://www.onlinehb.com/q123e");

(@html) = grab (@urls);
print $html[0];



sub grab
{
   @results= ();

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->timeout ($timeout);
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(6); # sets maximum number of locations accessed in parallel
   $ua->max_req  (2); # sets maximum number of parallel requests per host

  foreach $url2 (@_)
  {
       $ua->register(HTTP::Request->new(GET => $url2), \&callback);
  }

  $ua->wait ();
  return @results;

}
sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        #majortom-ga's change
        my $oresponse = $response;
        while (defined($oresponse->previous)) {
                $oresponse = $oresponse->previous;
        }
        for ($i=0; $i<@urls; $i++)
        {
                # check $oresponse->base, not $response->base
                if ( index( $oresponse->base, $urls[$i]) != -1 )
                {
                        $results[$i].=$data;
                        last;
                }
        }
}

Clarification of Answer by majortom-ga on 07 Feb 2004 11:51 PST
My apologies, there is one other minor change: you need to change
$ua->redirect(0); to $ua->redirect(1). Here is the complete working
version:


#!/usr/bin/perl
# test.cgi
# all rights reserved by Marc Fest
# 2004


use LWP::Simple;
require LWP::Parallel::UserAgent;

#@urls = ("http://www.onlinehomebase.com/index.cgi?shortcut_4_q123e");
@urls = ("http://www.onlinehb.com/q123e");

(@html) = grab (@urls);
print $html[0];



sub grab
{
   @results= ();

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->timeout ($timeout);
   $ua->redirect (1); # majortom-ga: ALLOW automatic following of redirects
   $ua->max_hosts(6); # sets maximum number of locations accessed in parallel
   $ua->max_req  (2); # sets maximum number of parallel requests per host

  foreach $url2 (@_)
  {
       $ua->register(HTTP::Request->new(GET => $url2), \&callback);
  }

  $ua->wait ();
  return @results;

}
sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        #majortom-ga's change
        my $oresponse = $response;
        while (defined($oresponse->previous)) {
                $oresponse = $oresponse->previous;
        }
        for ($i=0; $i<@urls; $i++)
        {
                # check $oresponse->base, not $response->base
                if ( index( $oresponse->base, $urls[$i]) != -1 )
                {
                        $results[$i].=$data;
                        last;
                }
        }
}

Request for Answer Clarification by marcfest-ga on 07 Feb 2004 12:48 PST
Your most recent version produces the following error when I run it:

Can't call method "request" on an undefined value at
/usr/lib/perl5/site_perl/5.6.1/LWP/UserAgent.pm line 528.

Clarification of Answer by majortom-ga on 07 Feb 2004 13:31 PST
I was going to tell you to upgrade to the latest version of
LWP::UserAgent, but just to be on the safe side, I did that myself to
check for surprises. And there was a very big surprise: when I
upgraded, I got the same error you did!

Naturally I assumed there had to be a reasonable explanation for this
and a straightforward workaround. But after much research, all I found
was a few bug reports and no fixes.

Finally, I fixed it myself. At line 1270 of the latest
LWP/Parallel/UserAgent.pm, you'll find this call:

        return $response unless $self->redirect_ok($referral);

This is incorrect. The $response parameter is missing, and the latest version
of redirect_ok actually cares about that (which it didn't previously) and
fails miserably without it. This prevents LWP::Parallel from
successfully handling redirects.

I fixed it simply as shown:

return $response unless $self->redirect_ok($referral, $response);

Hopefully you have access to modify 
/usr/lib/perl5/site_perl/5.6.1/LWP/Parallel/UserAgent.pm as shown
above (the version number of installation path might vary slightly for
you), otherwise I'd be glad to show you how to convince this one
program only to fetch a modified copy of LWP/Parallel/UserAgent.pm
from an alternate classpath in your own file space. If you have root
access on the machine that will not be necessary.

I have verified that the program I submitted as my previous answer
clarification does work both with and without a redirect taking place.

I will submit this bug report and fix to the LWP::Parallel
maintainer(s) today so that you do not face issues when upgrading via
CPAN.

Sorry for the inconvenience; I'll bet you didn't expect a bugfix to a
popular Perl module to be a necessary part of the answer. I'll be sure
to add a further comment if I hear anything interesting back from the
LWP::Parallel maintainers after submitting the fix.
marcfest-ga rated this answer:5 out of 5 stars and gave an additional tip of: $5.00
Walked the extra mile. Thank you!

Comments  
Subject: Re: Perl: Parallel User Agent question
From: majortom-ga on 08 Feb 2004 05:42 PST
 
FYI, I have had a response from the maintainer of LWP::Parallel; he
says that while LWP::Parallel "is a bit neglected these days," he
thinks he can probably get this fix checked in, since it's ready-made.
So hopefully the official CPAN will have this soon.
Subject: Re: Perl: Parallel User Agent question
From: marcfest-ga on 08 Feb 2004 07:45 PST
 
Wonderful!

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy