Google Answers Logo
View Question
 
Q: Perl script for fetching URLs ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Perl script for fetching URLs
Category: Computers > Programming
Asked by: marcfest-ga
List Price: $15.00
Posted: 14 Jan 2004 07:45 PST
Expires: 13 Feb 2004 07:45 PST
Question ID: 296364
I need a perl function "grab" that uses the parallel user agent module
(that's important because I want it to work quickly) to retrieve all
URLs in @urls and put their contents into an array (where $array[0]
holds the content of $url[0], etc). If timeout is exceeded, content
will be "timeout".

I'd like to use it like shown in the example below, so please
formulate your function so that I can drop it into the script below
and it will work. Thank you.

Marc.

#!/usr/bin/perl

@urls=("http://www.yahoo.com", "http://www.cnn.com", "http://news.google.com/");

$timeout = 20; # each request times out after 20 seconds)
@content = &grab(@urls);

print "everything grabed";

exit;

sub grab {

# your grab function here

}

Request for Question Clarification by joseleon-ga on 14 Jan 2004 07:54 PST
Hello, marcfest:
  Just to be sure, do you want to retrieve the contents of the index
page of an url? For example:

http://www.yahoo.com/index.html

http://www.cnn.com/index.html

And place the text of that page into an array, where each key contains
the contents of that page, isn't it?

Regards.

Clarification of Question by marcfest-ga on 14 Jan 2004 10:23 PST
I want it to fetch the contents of the URLs specified. So if it's
"http://www.cnn.com", yes, it would be the index.html page and I want
the complete source text of that page, not just the text as it would
be rendered in a browser. If it's something like
"http://www.abc.com/example.html" it would be that page. Basically, I
want to retrieve what a browser like IE would retrieve. Maybe also put
in an option if you can to make it appear to the server as if the
requests are coming from an IE browser (since some pages use browser
detection).

I think you mean that each element of the array (not "key") would
contain the retrieved contents (with $content[1] being an element, and
"1" being the key in this example). That's what I want.
Thank you.
Answer  
Subject: Re: Perl script for fetching URLs
Answered By: joseleon-ga on 14 Jan 2004 14:49 PST
Rated:5 out of 5 stars
 
Hello, marcfest:

 Here is the piece of code that do the trick, it uses Parallel User
Agent for improved performance and it authenticates as if where
Internet Explorer. Be aware of word wrapping when copying and pasting:
 
--
#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


@urls=("http://www.yahoo.com", "http://www.cnn.com/", "http://news.google.com/");

$timeout = 20; # each request times out after 20 seconds)
@content = grab(@urls);

#This prints the contents of http://www.cnn.com
print $content[1];


sub grab
{
   my @results;
   my $i=0;
   
   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer"); 
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host

  foreach $url (@_)
  {

       $response = $ua->request(HTTP::Request->new(GET => $url));
       $results[$i]=$response->content;
       $i++;
  }
  
  $ua->wait ( $timeout );
 
  return @results;

}
--

Please, test it and don't hesitate to request for any clarification, I
hope you like it!

Regards.

Request for Answer Clarification by marcfest-ga on 16 Jan 2004 05:09 PST
One more thing, joseleon-ga:

In the docu for this module
(http://aspn.activestate.com/ASPN/CodeDoc/ParallelUserAgent/LWP/Parallel/UserAgent.html)
it says

For parallel access, you will need to use the new methods that come
with LWP::Parallel::UserAgent, called $pua->register and $pua->wait.
See below for more information on each method.

I don't see "register" being used in your code. Are you sure the URLs
are being retrieved in parallel? Thank you.

Clarification of Answer by joseleon-ga on 16 Jan 2004 06:40 PST
Hello, marcfest:
  Sorry, I think I pasted a previous version of the code, here is the
final one, I have also added a print on the callback to let you know
the requests are being done in parallel, let me know if you have any
other question/problem:

--
#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


@urls=("http://www.yahoo.com", "http://www.cnn.com/", "http://news.google.com/");

$timeout = 20; # each request times out after 20 seconds)
@content = grab(@urls);

#This prints the contents of http://www.cnn.com
print $content[1];


sub grab
{
   @results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer"); 
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host

   
  foreach $url (@_)
  {

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }
  
  $ua->wait ( $timeout );
 
  return @results;

}

sub callback 
{
	my($data, $response, $protocol) = @_; 

	#Comment this line to prevent show the url
	print $response->base."\n";
	for ($i=0; $i<@urls; $i++)  
	{
		if ( index( $response->base, $urls[$i]) != -1 ) 
		{
			$results[$i].=$data;
			last;
		}
	}
}
--

Regards.

Request for Answer Clarification by marcfest-ga on 16 Jan 2004 07:37 PST
this seems to work much faster. Just curious: what's happening in the
callback routine. It's seems that each URL is being looped through
many time, up to a dozen, or so.

Clarification of Answer by joseleon-ga on 16 Jan 2004 07:57 PST
Hello, marcfest:
  This is done to return the results as an array in which each
position holds the contents of the text, because is being done in
paralell, each piece of data must be added to the right position on
the array, and on the callback, I only have the url to add that piece
of code to the right position.

The code would be much faster if instead require the results in an
array, you accept the results in an associative array, just change the
callback this way:

sub callback 
{
	my($data, $response, $protocol) = @_; 

	#Comment this line to prevent show the url
	print $response->base."\n";
        $results[$response->base].=$data;
}

And then, you could access the results this way:

$results['http://www.yahoo.com/'];

Regards.

Request for Answer Clarification by marcfest-ga on 16 Jan 2004 08:41 PST
Dear joseleon-ga  - my last request - 

Can you change the entire script so it works with using an associative
array and post it? I've tried %content = grab (@urls); print
$content[http://www.newmediamusings.com/]"; but that didn't work.

See my script below:

#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


@urls = (
"http://www.fuckedcompany.com",
"http://www.digitaldeliverance.com",
"http://www.andrewtobias.com",
"http://www.andrewsullivan.com",
"http://www.newmediamusings.com/"
);


$timeout = 20; # each request times out after 20 seconds)
%content = grab(@urls);

#This prints the contents of http://www.cnn.com
# print $content[1];
print "done $content[http://www.newmediamusings.com/]";

sub grab
{
   %results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host


  foreach $url (@_)
  {

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }

  $ua->wait ( $timeout );

  return %results;

}

sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        $results[$response->base].=$data;
}

Clarification of Answer by joseleon-ga on 16 Jan 2004 09:59 PST
Hello, marcfest:
 No problem, here it is, I have used a global var to store contents
instead pass the associative array by reference. I couple of notes:

This URL

http://www.andrewsullivan.com

It's not working, at least for me, that's why is commented.

I have set the timeout to 120, because this website:

http://www.newmediamusings.com/

Is really hughe and takes a lot of time to download.

And also remember, when you dump the contents of a website, must be this way:

print $results{'http://www.fuckedcompany.com/index.html'};

Even the url is http://www.fuckedcompany.com, the result is stored on this key.

--
#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


@urls = (
"http://www.fuckedcompany.com",
"http://www.digitaldeliverance.com",
"http://www.andrewtobias.com",
#"http://www.andrewsullivan.com",  I cannot access this page!!
"http://www.newmediamusings.com/"
);

#Array where content is going to be placed
@results;


$timeout = 120; # each request times out after 120 seconds)
grab(@urls);

# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /

#print $results{'http://www.fuckedcompany.com/index.html'};
#print $results{'http://www.digitaldeliverance.com'};
#print $results{'http://www.andrewtobias.com'};
print $results{'http://www.newmediamusings.com/'};


sub grab
{
   @results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer"); 
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host

   
  foreach $url (@_)
  {

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }
  
  $ua->wait ( $timeout );
 
  return %results;

}

sub callback 
{
	my($data, $response, $protocol) = @_; 
	#Comment this line to prevent show the url
	#print $response->base."\n";

	$results{$response->base}.=$data;		
	return;

}
--

Request as many changes you want, I'm here to help you.

Regards.

Request for Answer Clarification by marcfest-ga on 16 Jan 2004 11:15 PST
Can you make it so I can look up the results without having to make a
guess whether I have to add "index.html" to a key or not? If this is
too complicated then please don't worry about it. Thank you.

Clarification of Answer by joseleon-ga on 16 Jan 2004 11:49 PST
Hello, marcfest:
  Sure, no problem, here is the callback function, you just need to
modify it, it parses the URL using a regular expression and sets the
key of the associative array as http://www.domain.com without ending
slash.

--
sub callback 
{
	my($data, $response, $protocol) = @_; 
	#Comment this line to prevent show the url
	#print $response->base;
	
	$url=$response->base;
	$url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;

	$results{$1."://".$2}.=$data;		
	return;
}
--

Regards.

Request for Answer Clarification by marcfest-ga on 16 Jan 2004 16:59 PST
Joseleon - Sorry I'm bothering you one last time. Andrewsullivan.com
being down, I've noticed that the timeout function does not seem to
work at all. What I need is that when the timeout is exceeded the
script will assign the phrase "timeout" to the variable holding the
content. I've set the timeout to 5 seconds, but it does not seem to
make a different. See my script below. Thanks for walking the extra
mile.

#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


@urls = (
"http://www.fuckedcompany.com",
#"http://www.digitaldeliverance.com",
#"http://www.andrewtobias.com",
"http://www.andrewsullivan.com",
#"http://www.newmediamusings.com/",
#"http://www.omarmasry.net/index.html"
);

$timeout = 5; # each request times out after 20 seconds)
@content = grab(@urls);

#This prints the contents of http://www.cnn.com
print $content[5];


sub grab
{
   @results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(6); # sets maximum number of locations accessed in parallel
   $ua->max_req  (6); # sets maximum number of parallel requests per host

  foreach $url (@_)
  {

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }

  $ua->wait ( $timeout );

  return @results;

}

sub callback
{
        my($data, $response, $protocol) = @_;

        #Comment this line to prevent show the url
        print $response->base."\n";
        for ($i=0; $i<@urls; $i++)
        {
                if ( index( $response->base, $urls[$i]) != -1 )
                {
                        $results[$i].=$data;
                        last;
                }
        }
}

Request for Answer Clarification by marcfest-ga on 17 Jan 2004 03:59 PST
FYI: when I currently include www.andrewsullivan.com in @urls the
script seems to hang far beyond the specified timeout plus it causes
one other URLs not to be fetched either. Removing
www.andrewsullivan.com causes all URLs to be fetched quickly and
without problems. The timeout mechanism is not working. FYI: When
googling this issue I found that there seem to be problems with
Parallel User Agent's timeout. I'd rather go without parallel user
agent and with a working timeout mechanism instead because the script
is useless to me if a single url can undermine its operation the way
currently is happening. I'll be happy to pay you another $20 if you
can resolve this for me. Thank you.

Clarification of Answer by joseleon-ga on 17 Jan 2004 08:52 PST
Hello, marcfest:
  I can adapt the script to operate without Parallel User Agent and
handle timeout correctly, in any case, because this question is
already answered, I don't know if you still can tip me again.

If you can do it, I will use this answer, but if not, you can open a new one.

Regards.

Request for Answer Clarification by marcfest-ga on 17 Jan 2004 09:30 PST
www.andrewsullivan.com seems back up. Simply use a bogus address like
"http://216.239.39.111" to ensure that the timeout behavior works like
indicated in the original specs for this project ('If timeout is
exceeded, content will be "timeout"').

Thanks a lot!

Request for Answer Clarification by marcfest-ga on 17 Jan 2004 09:37 PST
can you get timeouts to work correctly with pua?

Clarification of Answer by joseleon-ga on 17 Jan 2004 09:48 PST
Hello, marcfest:
  I'm going to make some tests, I will get back to you.

Regards.

Clarification of Answer by joseleon-ga on 17 Jan 2004 11:10 PST
Hello, marcfest:
  Test this version to check if it handles timeout correctly. I have
added a set to prevent the UserAgent to connect several times to a
site that fails.

Now, if a site is down, it will return timeout instead "", as before.

Also bear in mind that the URLs you want to examine that don't point
to a page, ie http://www.andrewtobias.com, must end with an / to be
parsed correctly.

--
#!/usr/bin/perl

#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


#URLs must end with a /
@urls = (
"http://www.fuckedcompany.com/",
"http://www.digitaldeliverance.com/",
"http://www.andrewtobias.com/",
"http://www.andrewsullivan.com/",
"http://216.239.39.111/",
"http://www.newmediamusings.com/"
);

#Array where content is going to be placed
@results;


$timeout = 1; # each request times out after 1 seconds
grab(@urls);

# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /

#print $results{'http://www.fuckedcompany.com/'};
#print $results{'http://www.digitaldeliverance.com/'};
#print $results{'http://www.andrewtobias.com/'};
print $results{'http://www.fuckedcompany.com/'};
print $results{'http://216.239.39.111/'};


sub grab
{
   @results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer"); 
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host
   $ua->remember_failures(1);

   
  foreach $url (@_)
  {
  	$dom=$url;
  	$dom =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
  	$results{$1."://".$2."/"}="timeout";

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }
  
  $ua->wait ( $timeout );
 
  return %results;

}

sub callback 
{
	my($data, $response, $protocol) = @_; 
	#Comment this line to prevent show the url
	#print $response->base."\n";
	
	$url=$response->base;
	$url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
	
	if ($results{$1."://".$2."/"} eq "timeout")
	{
		$results{$1."://".$2."/"}="";		
	}
	$results{$1."://".$2."/"}.=$data;		
	
	return;
}
--

Regards.

Request for Answer Clarification by marcfest-ga on 17 Jan 2004 12:05 PST
The script still does not time out after the perscribed time out
interval when a bogus URL is included. Instead it seems to hang. I've
appended the script as I ran it, plus the debugging output below.

#!/usr/bin/perl

#Uncomment to get full debug info
use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;


#URLs must end with a /
@urls = (
"http://www.yahoo.com/",
"http://216.239.39.111/",
#"http://www.newmediamusings.com/"
);

#Array where content is going to be placed
@results;


$timeout = 1; # each request times out after 1 seconds
grab(@urls);

# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /

#print $results{'http://www.fuckedcompany.com/'};
#print $results{'http://www.digitaldeliverance.com/'};
#print $results{'http://www.andrewtobias.com/'};
print $results{'http://www.fuckedcompany.com/'};
print $results{'http://216.239.39.111/'};

sub grab
{
   @results;

   $ua = LWP::Parallel::UserAgent->new();
   $ua->agent("MS Internet Explorer");
   $ua->redirect (0); # prevents automatic following of redirects
   $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
   $ua->max_req  (5); # sets maximum number of parallel requests per host
   $ua->remember_failures(1);


  foreach $url (@_)
  {
        $dom=$url;
        $dom =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
        $results{$1."://".$2."/"}="timeout";

       $ua->register(HTTP::Request->new(GET => $url), \&callback);
  }

  $ua->wait ( $timeout );

  return %results;

}

sub callback
{
        my($data, $response, $protocol) = @_;
        #Comment this line to prevent show the url
        print $response->base."\n";

        $url=$response->base;
        $url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;

        if ($results{$1."://".$2."/"} eq "timeout")
        {
                $results{$1."://".$2."/"}="";
        }
        $results{$1."://".$2."/"}.=$data;

        return;
}

DEBUGGING OUTPUT

LWP::UserAgent::new: ()
LWP::Parallel::UserAgent::redirect: (0)
LWP::Parallel::UserAgent::max_hosts: (5)
LWP::Parallel::UserAgent::max_req: (5)
LWP::Parallel::UserAgent::remember_failures: (1)
LWP::Parallel::UserAgent::register: (http://www.yahoo.com/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::register: (http://216.239.39.111/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::wait: (1)
LWP::Parallel::UserAgent::wait:
        Current Server: 0 [  ]
        Pending Server: 2 [ 216.239.39.111:80, 1, www.yahoo.com:80, 1 ]
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::on_connect: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b8a2c)) [GET http://216.239.39.111/]
LWP::Parallel::UserAgent::init_request: GET http://216.239.39.111/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83b8d68), 180, 1)

SCRIPT HANGING AFTER LINE ABOVE

Request for Answer Clarification by marcfest-ga on 17 Jan 2004 12:07 PST
Actually, the script terminated after hanging for approx. 3 minutes,
producing the error output below:

LWP::UserAgent::new: ()
LWP::Parallel::UserAgent::redirect: (0)
LWP::Parallel::UserAgent::max_hosts: (5)
LWP::Parallel::UserAgent::max_req: (5)
LWP::Parallel::UserAgent::remember_failures: (1)
LWP::Parallel::UserAgent::register: (http://www.yahoo.com/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::register: (http://216.239.39.111/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::wait: (1)
LWP::Parallel::UserAgent::wait:
        Current Server: 0 [  ]
        Pending Server: 2 [ 216.239.39.111:80, 1, www.yahoo.com:80, 1 ]
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::on_connect: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b8a2c)) [GET http://216.239.39.111/]
LWP::Parallel::UserAgent::init_request: GET http://216.239.39.111/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83b8d68), 180, 1)
LWP::Parallel::UserAgent::on_failure: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_check_bandwith: Failed connection for
'216.239.39.111:80'
LWP::Parallel::UserAgent::_make_connections_unordered: Queue for
216.239.39.111:80 contains 0 pending connections
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::on_connect: (http://www.yahoo.com/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b4738)) [GET http://www.yahoo.com/]
LWP::Parallel::UserAgent::init_request: GET http://www.yahoo.com/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83c2154), 180, 1)
LWP::Parallel::Protocol::http::_connect: Socket is IO::Socket::INET=GLOB(0x8409b48)
LWP::Parallel::UserAgent::_make_connections_unordered: Queue for
www.yahoo.com:80 contains 0 pending connections
LWP::Parallel::UserAgent::_make_connections_unordered: Deleting queue
for 216.239.39.111:80
LWP::Parallel::UserAgent::_make_connections_unordered: Deleting queue
for www.yahoo.com:80
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_write: Writing to Sockets
LWP::Parallel::Protocol::http::write_request: write_request
(HTTP::Request=HASH(0x83b4738), IO::Socket::INET=GLOB(0x8409b48), /,
CODE(0x81655b8), 1, [undef])
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::http::read_chunk: Identified HTTP Protocol:
HTTP/1.1 200 OK
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1095 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1095 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1095' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'

LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4) (http://www.
.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)

LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
4344 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 4344 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '4344' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2392 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2392 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2392' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::UserAgent::_perform_read: '0' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::on_return: (http://www.yahoo.com/, 200, OK)
LWP::Parallel::UserAgent::_perform_read: received '1' from on_return
LWP::Parallel::UserAgent::_remove_current_connection:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_make_connections_unordered: ()
timeout[root@jedi changedetect]#

Clarification of Answer by joseleon-ga on 18 Jan 2004 03:49 PST
Hello, marcfest:
  I have run several times the script you sent me and it works ok, but
it can happen that the script hangs for a while, remember we are using
PUA, which in the cpan page, the author says in the BUGS section:

"Probably lots! This was meant only as an interim release until this
functionality is incorporated into LWPng, the next generation libwww
module (though it has been this way for over 2 years now!)"

And people report some problems about it over the net, so you cannot
trust 100% on it. The script works, it does what you wanted, but rely
on a perl class that can fail, so if you are interested in a solution
without PUA, just tell me and I will work on it.

In any case, check you are using the latest version of PUA:

http://search.cpan.org/~marclang/ParallelUserAgent-2.56/

remove the debug lines, that is, the LWP::Debug and the rest of
additional prints and try several times.

Regards.
marcfest-ga rated this answer:5 out of 5 stars and gave an additional tip of: $5.00
Answer delivers 100%. Thank you.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy