Google Answers: perl function improvement

View Question

Q: perl function improvement ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: perl function improvement
Category: Computers > Programming
Asked by: marcfest-ga
List Price: $20.00

Posted: 18 Jan 2004 10:41 PST
Expires: 17 Feb 2004 10:41 PST
Question ID: 297701

Please look at the perl script below. It uses a function called "grab" to fetch URLs of the Web. I need the function's timeout feature fixed. Right now, when including a non existent server like "http://216.239.39.111/" (which is a bogus url to simulate a server that's down; it should trigger the timeout) in @urls, the script will hang for up to 3 minutes instead of timing out after 3 seconds. This may be due to a bug in the PUA Perl module. Please make it so the "grab" function times out appropriately, i.e. in this case after 3 seconds. You may have to stop using PUA to accomplish this. Make it so that when a request times out, the value "timeout" is assigned to the variable that would otherwise hold the fetched html. Please try to keep grab working as fast as possible. Thank you. Marc. #!/usr/bin/perl #Uncomment to get full debug info #use LWP::Debug qw(+ -conns); use LWP::Simple; require LWP::Parallel::UserAgent; require HTTP::Request; @urls = ( "http://www.yahoo.com/", "http://216.239.39.111/", ); $timeout = 3; # each request times out after 3 seconds) @content = grab(@urls); #This prints the contents of http://www.cnn.com print $content[0]; exit; sub grab { @results; $ua = LWP::Parallel::UserAgent->new(); $ua->agent("MS Internet Explorer"); $ua->redirect (0); # prevents automatic following of redirects $ua->max_hosts(6); # sets maximum number of locations accessed in parallel $ua->max_req (6); # sets maximum number of parallel requests per host foreach $url (@_) { $ua->register(HTTP::Request->new(GET => $url), \&callback); } $ua->wait ( $timeout ); return @results; } sub callback { my($data, $response, $protocol) = @_; #Comment this line to prevent show the url print $response->base."\n"; for ($i=0; $i<@urls; $i++) { if ( index( $response->base, $urls[$i]) != -1 ) { $results[$i].=$data; last; } } }
Request for Question Clarification by haversian-ga on 20 Jan 2004 16:48 PST Hello marcfest-ga I know how to fix your grab function, but in neither the original nor the fixed version does the callback seem to do much. In particular, its print function doesn't print anything. Do you want just a fix for grab, or do you want me to see what I can do with callback? If you'd like me to work on callback, what is it supposed to do? What parts of the response do you want stored in @results? -Haversian
Clarification of Question by marcfest-ga on 20 Jan 2004 19:48 PST Hi Haversian - pls change the script as you deem fit. What matters is that it will work fast and that the timeout will work. Pls test it with http://216.239.39.111/ which should cause the time out to be triggered since it's a bogus URL. Thx.
Request for Question Clarification by haversian-ga on 20 Jan 2004 20:51 PST Thanks for the quick reply. I wouldn't feel comfortable taking your $20 if I'm not getting you what you want. Are you happy with the callback function? That is, is the script giving you the output you want, just taking too long to do it? If so, I'll post my changes to the grab function and get you on your way. -Haversian
Clarification of Question by marcfest-ga on 21 Jan 2004 03:36 PST The only thing that I'm unhappy about is that the script's timeout function does not work. This problem surfaces when using a bogus url like "http://216.239.39.111/" in order to simulate a server that's unavailable. Instead of timing out after 3 seconds, that current script will try to get this URL for up to 3 minutes. So what I need fixed is this faulty timeout behavior.

Answer

Subject: Re: perl function improvement
Answered By: haversian-ga on 21 Jan 2004 05:41 PST
Rated: 5 out of 5 stars

Good morning marcfest, You've made a small error in using $ua->wait in your script. Since the call to wait occurs after your foreach loop, it does not impact the execution of any code within that loop. Instead, you have to use $ua->timeout() to set the timeout value while you're setting up the other constants governing the behavior of LWP::Parallel in your script: sub grab { @results; $ua = LWP::Parallel::UserAgent->new(); $ua->agent("MS Internet Explorer"); $ua->timeout ($timeout); # <--- ADD THIS $ua->redirect (0); # prevents automatic following of redirects $ua->max_hosts(6); # sets maximum number of locations accessed in parallel $ua->max_req (6); # sets maximum number of parallel requests per host foreach $url (@_) { $res = $ua->register(HTTP::Request->new('GET', $url), \$callback); } $ua->wait (); # <--- simply wait until all registered URLs are dealt with return @results; } LINK: http://search.cpan.org/~marclang/ParallelUserAgent-2.56/lib/LWP/Parallel.pm The CPAN page on LWP::Parallel was invaluable in answering this question for you. It has several examples that you may find useful in continuing to work with this script. -Haversian
Request for Answer Clarification by marcfest-ga on 21 Jan 2004 07:31 PST Runnning the script below which contains you suggested changes only produces "xx" as output. Something's not working. Please advise. #!/usr/bin/perl #Uncomment to get full debug info #use LWP::Debug qw(+ -conns); use LWP::Simple; require LWP::Parallel::UserAgent; require HTTP::Request; @urls = ( "http://www.yahoo.com/", "http://216.239.39.111/", ); $timeout = 10; # each request times out after 3 seconds) @content = grab(@urls); print "xx $content[0]"; exit; sub grab { @results; $ua = LWP::Parallel::UserAgent->new(); $ua->agent("MS Internet Explorer"); $ua->timeout ($timeout); # <--- ADD THIS $ua->redirect (0); # prevents automatic following of redirects $ua->max_hosts(6); # sets maximum number of locations accessed in parallel $ua->max_req (6); # sets maximum number of parallel requests per host foreach $url (@_) { $res = $ua->register(HTTP::Request->new('GET', $url), \$callback); } $ua->wait (); # <--- simply wait until all registered URLs are dealt with return @results; } sub callback { my($data, $response, $protocol) = @_; #Comment this line to prevent show the url print $response->base."\n"; for ($i=0; $i<@urls; $i++) { if ( index( $response->base, $urls[$i]) != -1 ) { $results[$i].=$data; last; } } }
Clarification of Answer by haversian-ga on 21 Jan 2004 09:40 PST 'Afternoon, That's what I was referring to when I kept asking about your callback function. It doesn't seem to be properly loading values into results. Does it work (albeit slowly) when using your original code? It didn't for me. I have some code I placed in the grab function for testing purposes while answering your question. It's at home, but I could post it here if you're interested. It prints the response header (200 OK, 440 not found, etc) for each URL, and could probably be extended to record more of the response and place it into the results variable, but as I understood it you wanted the callback function to handle that instead. Let me know what the status is; I'll get back to you this evening. -Haversian
Request for Answer Clarification by marcfest-ga on 21 Jan 2004 10:44 PST The callback part wasn't written by me. I don't know how it works. What I would like as an "answer" to this question is a posting of the complete, reworked script functioning according to the specs: i.e. it grabs the sites quickly and times out appropriately. I'd also appreciate if you could test it before posting it; make sure to use "http://216.239.39.111/" to simulate a time out and make sure that it grabs the other sites OK. Thanks a bunch.
Clarification of Answer by haversian-ga on 21 Jan 2004 13:48 PST > The callback part wasn't written by me. I don't know how it works. Yes, but does it work? On your system, with your original script, do you get entries in the content variable? Since I wasn't sure what callback's purpose was, I simply ignored it and included my own test code right in the grab subroutine. Which works fine. Simply replace: $ua->wait (); in the code I posted as an answer with: $entries = $ua->wait (); foreach (keys %$entries) { my $res = $entries->{$_}->response; print "RESULTS: " . $res->message . "\n"; } Also remove the reference to callback like so: $res = $ua->register(HTTP::Request->new('GET', $url));#, \$callback); You'll see the HTTP response headers for the URLs tested (I tested your original two and added several both good and bad URLs as well). If you tell me what you mean by "grabs the other sites OK.", I can try to rewrite callback or incorporate the code into the grab subroutine for you. -Haversian
Request for Answer Clarification by marcfest-ga on 21 Jan 2004 14:55 PST Haversian - Please post a complete script for me, so that all I have to do is cut and paste it and run it that way, without having to replace stuff and change lines. Try to make it as safe and convenient for me as you can. I'll appreicate it. By "grabbing" a URL I mean what the script is supposed to do, i.e. fetch a URL off the Web. Thx.
Request for Answer Clarification by marcfest-ga on 21 Jan 2004 15:02 PST Also, please assume that I am a total idiot when it comes to perl code. The original script was done by someone else. I cannot give you the answer to any of your code-related questions. The original script ran OK as long as it did not encounter any URLs whose pages were unavailable (those URLs would cause the timeout mechanism to not work as explained in my original specs which is why I posted this question). Again, what I hope to receive from an expert here is a complete script, (not bits and pieces), that is tested and will solve the timeout issue. If you want to get rid of this case and like me to ask for a refund, that's no problem. Pls let me know. Otherwise, please post the complete code of a script that works according to my specs. Thank you. Sorry about me being so difficult.
Clarification of Answer by haversian-ga on 22 Jan 2004 22:12 PST Marcfest, The difficulty of communication isn't a problem - some questions just take more back-and-forth than others. I had assumed you wrote the script, which turns out to be a bad assumption - that clears some things up. As to the specifications, the script as you presented it to me does not work on my system. Since it worked on yours, I will assume this is a misconfiguration issue on my part, and will work to fix it tomorrow. In either case, the script only prints out $content[0], that is, the first URL given to it. Would you like it changed to print out the content of all the webpages, one by one? Would you like some sort of divider between each URL? I'll sleep on it, and tackle the problem anew tomorrow. I'm confident the hard part is behind us and at long last I should be able to get you the response you've been looking for. My apologies for the delay. -Haversian
Request for Answer Clarification by marcfest-ga on 23 Jan 2004 02:08 PST printing out just $content[0] is OK. This line is for testing purposes only.
Clarification of Answer by haversian-ga on 24 Jan 2004 09:09 PST Ok, I've got things working ( I had problems with my HTML::Parser module ). When I changed things so they worked (for my test code), I ended up breaking them so they didn't work (for your code). I've backed out of some of those changes, and have only a few changes left in the code. As you requested, here is the completed, tested, script: #!/usr/bin/perl #Uncomment to get full debug info #use LWP::Debug qw(+ -conns); use LWP::Simple; require LWP::Parallel::UserAgent; require HTTP::Request; @urls = ( "http://www.yahoo.com/", "http://216.239.39.111/", ); $timeout = 3; # each request times out after 3 seconds) @content = grab(@urls); #This prints the contents of one of the URLs, probably the first print $content[0]; exit; sub grab { @results; $ua = LWP::Parallel::UserAgent->new(); $ua->agent("MS Internet Explorer"); $ua->timeout ($timeout); $ua->redirect (0); # prevents automatic following of redirects $ua->max_hosts(6); # sets maximum number of locations accessed in parallel $ua->max_req (6); # sets maximum number of parallel requests per host foreach $url (@_) { $ua->register(HTTP::Request->new(GET => $url), \&callback); } $ua->wait (); return @results; } sub callback { my($data, $response, $protocol) = @_; #Comment this line to prevent show the url print $response->base."\n"; for ($i=0; $i<@urls; $i++) { if ( index( $response->base, $urls[$i]) != -1 ) { $results[$i].=$data; last; } } }

marcfest-ga rated this answer: 5 out of 5 stars

Put in a lot of effort to get it right and got it right. Walked the extra mile!

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy