Hello, marcfest:
Here is the piece of code that do the trick, it uses Parallel User
Agent for improved performance and it authenticates as if where
Internet Explorer. Be aware of word wrapping when copying and pasting:
--
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
@urls=("http://www.yahoo.com", "http://www.cnn.com/", "http://news.google.com/");
$timeout = 20; # each request times out after 20 seconds)
@content = grab(@urls);
#This prints the contents of http://www.cnn.com
print $content[1];
sub grab
{
my @results;
my $i=0;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
foreach $url (@_)
{
$response = $ua->request(HTTP::Request->new(GET => $url));
$results[$i]=$response->content;
$i++;
}
$ua->wait ( $timeout );
return @results;
}
--
Please, test it and don't hesitate to request for any clarification, I
hope you like it!
Regards. |
Request for Answer Clarification by
marcfest-ga
on
16 Jan 2004 05:09 PST
One more thing, joseleon-ga:
In the docu for this module
(http://aspn.activestate.com/ASPN/CodeDoc/ParallelUserAgent/LWP/Parallel/UserAgent.html)
it says
For parallel access, you will need to use the new methods that come
with LWP::Parallel::UserAgent, called $pua->register and $pua->wait.
See below for more information on each method.
I don't see "register" being used in your code. Are you sure the URLs
are being retrieved in parallel? Thank you.
|
Clarification of Answer by
joseleon-ga
on
16 Jan 2004 06:40 PST
Hello, marcfest:
Sorry, I think I pasted a previous version of the code, here is the
final one, I have also added a print on the callback to let you know
the requests are being done in parallel, let me know if you have any
other question/problem:
--
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
@urls=("http://www.yahoo.com", "http://www.cnn.com/", "http://news.google.com/");
$timeout = 20; # each request times out after 20 seconds)
@content = grab(@urls);
#This prints the contents of http://www.cnn.com
print $content[1];
sub grab
{
@results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
foreach $url (@_)
{
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return @results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
for ($i=0; $i<@urls; $i++)
{
if ( index( $response->base, $urls[$i]) != -1 )
{
$results[$i].=$data;
last;
}
}
}
--
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
16 Jan 2004 07:37 PST
this seems to work much faster. Just curious: what's happening in the
callback routine. It's seems that each URL is being looped through
many time, up to a dozen, or so.
|
Clarification of Answer by
joseleon-ga
on
16 Jan 2004 07:57 PST
Hello, marcfest:
This is done to return the results as an array in which each
position holds the contents of the text, because is being done in
paralell, each piece of data must be added to the right position on
the array, and on the callback, I only have the url to add that piece
of code to the right position.
The code would be much faster if instead require the results in an
array, you accept the results in an associative array, just change the
callback this way:
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
$results[$response->base].=$data;
}
And then, you could access the results this way:
$results['http://www.yahoo.com/'];
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
16 Jan 2004 08:41 PST
Dear joseleon-ga - my last request -
Can you change the entire script so it works with using an associative
array and post it? I've tried %content = grab (@urls); print
$content[http://www.newmediamusings.com/]"; but that didn't work.
See my script below:
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
@urls = (
"http://www.fuckedcompany.com",
"http://www.digitaldeliverance.com",
"http://www.andrewtobias.com",
"http://www.andrewsullivan.com",
"http://www.newmediamusings.com/"
);
$timeout = 20; # each request times out after 20 seconds)
%content = grab(@urls);
#This prints the contents of http://www.cnn.com
# print $content[1];
print "done $content[http://www.newmediamusings.com/]";
sub grab
{
%results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
foreach $url (@_)
{
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return %results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
$results[$response->base].=$data;
}
|
Clarification of Answer by
joseleon-ga
on
16 Jan 2004 09:59 PST
Hello, marcfest:
No problem, here it is, I have used a global var to store contents
instead pass the associative array by reference. I couple of notes:
This URL
http://www.andrewsullivan.com
It's not working, at least for me, that's why is commented.
I have set the timeout to 120, because this website:
http://www.newmediamusings.com/
Is really hughe and takes a lot of time to download.
And also remember, when you dump the contents of a website, must be this way:
print $results{'http://www.fuckedcompany.com/index.html'};
Even the url is http://www.fuckedcompany.com, the result is stored on this key.
--
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
@urls = (
"http://www.fuckedcompany.com",
"http://www.digitaldeliverance.com",
"http://www.andrewtobias.com",
#"http://www.andrewsullivan.com", I cannot access this page!!
"http://www.newmediamusings.com/"
);
#Array where content is going to be placed
@results;
$timeout = 120; # each request times out after 120 seconds)
grab(@urls);
# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /
#print $results{'http://www.fuckedcompany.com/index.html'};
#print $results{'http://www.digitaldeliverance.com'};
#print $results{'http://www.andrewtobias.com'};
print $results{'http://www.newmediamusings.com/'};
sub grab
{
@results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
foreach $url (@_)
{
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return %results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
#print $response->base."\n";
$results{$response->base}.=$data;
return;
}
--
Request as many changes you want, I'm here to help you.
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
16 Jan 2004 11:15 PST
Can you make it so I can look up the results without having to make a
guess whether I have to add "index.html" to a key or not? If this is
too complicated then please don't worry about it. Thank you.
|
Clarification of Answer by
joseleon-ga
on
16 Jan 2004 11:49 PST
Hello, marcfest:
Sure, no problem, here is the callback function, you just need to
modify it, it parses the URL using a regular expression and sets the
key of the associative array as http://www.domain.com without ending
slash.
--
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
#print $response->base;
$url=$response->base;
$url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
$results{$1."://".$2}.=$data;
return;
}
--
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
16 Jan 2004 16:59 PST
Joseleon - Sorry I'm bothering you one last time. Andrewsullivan.com
being down, I've noticed that the timeout function does not seem to
work at all. What I need is that when the timeout is exceeded the
script will assign the phrase "timeout" to the variable holding the
content. I've set the timeout to 5 seconds, but it does not seem to
make a different. See my script below. Thanks for walking the extra
mile.
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
@urls = (
"http://www.fuckedcompany.com",
#"http://www.digitaldeliverance.com",
#"http://www.andrewtobias.com",
"http://www.andrewsullivan.com",
#"http://www.newmediamusings.com/",
#"http://www.omarmasry.net/index.html"
);
$timeout = 5; # each request times out after 20 seconds)
@content = grab(@urls);
#This prints the contents of http://www.cnn.com
print $content[5];
sub grab
{
@results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(6); # sets maximum number of locations accessed in parallel
$ua->max_req (6); # sets maximum number of parallel requests per host
foreach $url (@_)
{
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return @results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
for ($i=0; $i<@urls; $i++)
{
if ( index( $response->base, $urls[$i]) != -1 )
{
$results[$i].=$data;
last;
}
}
}
|
Request for Answer Clarification by
marcfest-ga
on
17 Jan 2004 03:59 PST
FYI: when I currently include www.andrewsullivan.com in @urls the
script seems to hang far beyond the specified timeout plus it causes
one other URLs not to be fetched either. Removing
www.andrewsullivan.com causes all URLs to be fetched quickly and
without problems. The timeout mechanism is not working. FYI: When
googling this issue I found that there seem to be problems with
Parallel User Agent's timeout. I'd rather go without parallel user
agent and with a working timeout mechanism instead because the script
is useless to me if a single url can undermine its operation the way
currently is happening. I'll be happy to pay you another $20 if you
can resolve this for me. Thank you.
|
Clarification of Answer by
joseleon-ga
on
17 Jan 2004 08:52 PST
Hello, marcfest:
I can adapt the script to operate without Parallel User Agent and
handle timeout correctly, in any case, because this question is
already answered, I don't know if you still can tip me again.
If you can do it, I will use this answer, but if not, you can open a new one.
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
17 Jan 2004 09:30 PST
www.andrewsullivan.com seems back up. Simply use a bogus address like
"http://216.239.39.111" to ensure that the timeout behavior works like
indicated in the original specs for this project ('If timeout is
exceeded, content will be "timeout"').
Thanks a lot!
|
Request for Answer Clarification by
marcfest-ga
on
17 Jan 2004 09:37 PST
can you get timeouts to work correctly with pua?
|
Clarification of Answer by
joseleon-ga
on
17 Jan 2004 09:48 PST
Hello, marcfest:
I'm going to make some tests, I will get back to you.
Regards.
|
Clarification of Answer by
joseleon-ga
on
17 Jan 2004 11:10 PST
Hello, marcfest:
Test this version to check if it handles timeout correctly. I have
added a set to prevent the UserAgent to connect several times to a
site that fails.
Now, if a site is down, it will return timeout instead "", as before.
Also bear in mind that the URLs you want to examine that don't point
to a page, ie http://www.andrewtobias.com, must end with an / to be
parsed correctly.
--
#!/usr/bin/perl
#Uncomment to get full debug info
#use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
#URLs must end with a /
@urls = (
"http://www.fuckedcompany.com/",
"http://www.digitaldeliverance.com/",
"http://www.andrewtobias.com/",
"http://www.andrewsullivan.com/",
"http://216.239.39.111/",
"http://www.newmediamusings.com/"
);
#Array where content is going to be placed
@results;
$timeout = 1; # each request times out after 1 seconds
grab(@urls);
# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /
#print $results{'http://www.fuckedcompany.com/'};
#print $results{'http://www.digitaldeliverance.com/'};
#print $results{'http://www.andrewtobias.com/'};
print $results{'http://www.fuckedcompany.com/'};
print $results{'http://216.239.39.111/'};
sub grab
{
@results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
$ua->remember_failures(1);
foreach $url (@_)
{
$dom=$url;
$dom =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
$results{$1."://".$2."/"}="timeout";
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return %results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
#print $response->base."\n";
$url=$response->base;
$url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
if ($results{$1."://".$2."/"} eq "timeout")
{
$results{$1."://".$2."/"}="";
}
$results{$1."://".$2."/"}.=$data;
return;
}
--
Regards.
|
Request for Answer Clarification by
marcfest-ga
on
17 Jan 2004 12:05 PST
The script still does not time out after the perscribed time out
interval when a bogus URL is included. Instead it seems to hang. I've
appended the script as I ran it, plus the debugging output below.
#!/usr/bin/perl
#Uncomment to get full debug info
use LWP::Debug qw(+ -conns);
use LWP::Simple;
require LWP::Parallel::UserAgent;
require HTTP::Request;
#URLs must end with a /
@urls = (
"http://www.yahoo.com/",
"http://216.239.39.111/",
#"http://www.newmediamusings.com/"
);
#Array where content is going to be placed
@results;
$timeout = 1; # each request times out after 1 seconds
grab(@urls);
# This prints the contents of the page, be aware are not the same as the URLs
# but with index.html, or even with an ending /
#print $results{'http://www.fuckedcompany.com/'};
#print $results{'http://www.digitaldeliverance.com/'};
#print $results{'http://www.andrewtobias.com/'};
print $results{'http://www.fuckedcompany.com/'};
print $results{'http://216.239.39.111/'};
sub grab
{
@results;
$ua = LWP::Parallel::UserAgent->new();
$ua->agent("MS Internet Explorer");
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
$ua->remember_failures(1);
foreach $url (@_)
{
$dom=$url;
$dom =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
$results{$1."://".$2."/"}="timeout";
$ua->register(HTTP::Request->new(GET => $url), \&callback);
}
$ua->wait ( $timeout );
return %results;
}
sub callback
{
my($data, $response, $protocol) = @_;
#Comment this line to prevent show the url
print $response->base."\n";
$url=$response->base;
$url =~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|;
if ($results{$1."://".$2."/"} eq "timeout")
{
$results{$1."://".$2."/"}="";
}
$results{$1."://".$2."/"}.=$data;
return;
}
DEBUGGING OUTPUT
LWP::UserAgent::new: ()
LWP::Parallel::UserAgent::redirect: (0)
LWP::Parallel::UserAgent::max_hosts: (5)
LWP::Parallel::UserAgent::max_req: (5)
LWP::Parallel::UserAgent::remember_failures: (1)
LWP::Parallel::UserAgent::register: (http://www.yahoo.com/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::register: (http://216.239.39.111/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::wait: (1)
LWP::Parallel::UserAgent::wait:
Current Server: 0 [ ]
Pending Server: 2 [ 216.239.39.111:80, 1, www.yahoo.com:80, 1 ]
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::on_connect: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b8a2c)) [GET http://216.239.39.111/]
LWP::Parallel::UserAgent::init_request: GET http://216.239.39.111/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83b8d68), 180, 1)
SCRIPT HANGING AFTER LINE ABOVE
|
Request for Answer Clarification by
marcfest-ga
on
17 Jan 2004 12:07 PST
Actually, the script terminated after hanging for approx. 3 minutes,
producing the error output below:
LWP::UserAgent::new: ()
LWP::Parallel::UserAgent::redirect: (0)
LWP::Parallel::UserAgent::max_hosts: (5)
LWP::Parallel::UserAgent::max_req: (5)
LWP::Parallel::UserAgent::remember_failures: (1)
LWP::Parallel::UserAgent::register: (http://www.yahoo.com/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::register: (http://216.239.39.111/,
CODE(0x81655b8), [undef], [undef])
LWP::Parallel::UserAgent::wait: (1)
LWP::Parallel::UserAgent::wait:
Current Server: 0 [ ]
Pending Server: 2 [ 216.239.39.111:80, 1, www.yahoo.com:80, 1 ]
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::on_connect: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8d80)
[http://216.239.39.111/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b8a2c)) [GET http://216.239.39.111/]
LWP::Parallel::UserAgent::init_request: GET http://216.239.39.111/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83b8d68), 180, 1)
LWP::Parallel::UserAgent::on_failure: (http://216.239.39.111/)
LWP::Parallel::UserAgent::_check_bandwith: Failed connection for
'216.239.39.111:80'
LWP::Parallel::UserAgent::_make_connections_unordered: Queue for
216.239.39.111:80 contains 0 pending connections
LWP::Parallel::UserAgent::_check_bandwith:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::on_connect: (http://www.yahoo.com/)
LWP::Parallel::UserAgent::_connect:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::init_request: ->
(HTTP::Request=HASH(0x83b4738)) [GET http://www.yahoo.com/]
LWP::Parallel::UserAgent::init_request: GET http://www.yahoo.com/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Parallel::UserAgent::init_request: <- (undef, [undef],
LWP::Parallel::Protocol::http=HASH(0x83c2154), 180, 1)
LWP::Parallel::Protocol::http::_connect: Socket is IO::Socket::INET=GLOB(0x8409b48)
LWP::Parallel::UserAgent::_make_connections_unordered: Queue for
www.yahoo.com:80 contains 0 pending connections
LWP::Parallel::UserAgent::_make_connections_unordered: Deleting queue
for 216.239.39.111:80
LWP::Parallel::UserAgent::_make_connections_unordered: Deleting queue
for www.yahoo.com:80
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_write: Writing to Sockets
LWP::Parallel::Protocol::http::write_request: write_request
(HTTP::Request=HASH(0x83b4738), IO::Socket::INET=GLOB(0x8409b48), /,
CODE(0x81655b8), 1, [undef])
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::http::read_chunk: Identified HTTP Protocol:
HTTP/1.1 200 OK
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1095 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1095 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1095' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
1448 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 1448 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '1448' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4) (http://www.
.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2896 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2896 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2896' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
4344 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 4344 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '4344' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::Protocol::receive: ( [self], CODE(0x81655b8), 200 OK,
2392 bytes, LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::receive: [CODE] read 2392 bytes
http://www.yahoo.com/
LWP::Parallel::Protocol::receive: return-code from Callback was '[undef]'
LWP::Parallel::UserAgent::_perform_read: '2392' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::wait: Selecting Sockets, timeout is 1 seconds
LWP::Parallel::UserAgent::_perform_read: Reading from Sockets
LWP::Parallel::Protocol::http::read_chunk: read_chunk
(HTTP::Response=HASH(0x83d70b4), IO::Socket::INET=GLOB(0x8409b48),
HTTP::Request=HASH(0x83b4738), CODE(0x81655b8), 8192, 1,
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4))
LWP::Parallel::Protocol::http::read_chunk: reading response (0 buffered)
LWP::Parallel::UserAgent::_perform_read: '0' = read_chunk from
LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
(http://www.yahoo.com/)
LWP::Parallel::UserAgent::on_return: (http://www.yahoo.com/, 200, OK)
LWP::Parallel::UserAgent::_perform_read: received '1' from on_return
LWP::Parallel::UserAgent::_remove_current_connection:
(LWP::Parallel::UserAgent::Entry=HASH(0x83b8ad4)
[http://www.yahoo.com/] )
LWP::Parallel::UserAgent::_make_connections_unordered: ()
LWP::Parallel::UserAgent::_make_connections_unordered: ()
timeout[root@jedi changedetect]#
|
Clarification of Answer by
joseleon-ga
on
18 Jan 2004 03:49 PST
Hello, marcfest:
I have run several times the script you sent me and it works ok, but
it can happen that the script hangs for a while, remember we are using
PUA, which in the cpan page, the author says in the BUGS section:
"Probably lots! This was meant only as an interim release until this
functionality is incorporated into LWPng, the next generation libwww
module (though it has been this way for over 2 years now!)"
And people report some problems about it over the net, so you cannot
trust 100% on it. The script works, it does what you wanted, but rely
on a perl class that can fail, so if you are interested in a solution
without PUA, just tell me and I will work on it.
In any case, check you are using the latest version of PUA:
http://search.cpan.org/~marclang/ParallelUserAgent-2.56/
remove the debug lines, that is, the LWP::Debug and the rest of
additional prints and try several times.
Regards.
|