Google Answers Logo
View Question
 
Q: WGET a large web forum. ( No Answer,   3 Comments )
Question  
Subject: WGET a large web forum.
Category: Computers > Internet
Asked by: dcaban-ga
List Price: $10.00
Posted: 05 Dec 2005 11:25 PST
Expires: 04 Jan 2006 11:25 PST
Question ID: 601756
I am trying to wget a large web forum recursively with over 100,000 posts.

The command I am using is quite successful. It logs in and starts
recursively downloading the webpage. I have got upwards of 26,000
posts.

It seems though that after 2-5 hours it starts giving me 400 Bad
Request errors and cycles through the list requesting but not
downloading pages.

Here is the code I am using:
wget post-data 'forceredirect=1&do=login&vb_login_md5password=HEREISTHEHASH&vb_login_username=username&url=/forums/index.php&vb_login_password=&cookieuser=1'
-r --save-cookies cookies.txt --keep-session-cookies -o log.log -N -E
-b -k -R *login.php?do=logout* http://www.SITE.com/forums/login.php




I need a solution or workaround.
Answer  
There is no answer at this time.

Comments  
Subject: Re: WGET a large web forum.
From: bozo99-ga on 05 Dec 2005 19:28 PST
 
Does the site still work ?  You haven't consumed all their bandwidth
ration and they haven't blocked you?
Subject: Re: WGET a large web forum.
From: larkas-ga on 05 Dec 2005 21:35 PST
 
You should have at least been considerate enough to add the wait or
random-wait command line options.

The site has probably blocked you or went down because of your request.
Subject: Re: WGET a large web forum.
From: ljbuesch-ga on 07 Dec 2005 15:46 PST
 
If the website is not down, you can go through a proxy... heck, in
your script, have it go to a random proxy after say... X requests or
so.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy