Google Answers Logo
View Question
 
Q: Scrape Domain Names from Google Results ( No Answer,   0 Comments )
Question  
Subject: Scrape Domain Names from Google Results
Category: Computers
Asked by: author20-ga
List Price: $50.00
Posted: 28 Feb 2004 09:54 PST
Expires: 08 Mar 2004 09:46 PST
Question ID: 311738
When you get results from google, you get results that looks like this
(from actual google results on word "example":
===
Example Web Page
You have reached this web page by typing "example.com", "example.net",
or "example.org" into your web browser. These domain names ... 
example.org/ - 1k - Cached - Similar pages

====

I want to have you write a script for a search robot that does the following:

1. search a term
2. strip out the domain names of all the results hits, only the domain
name -- for example on the above hit -- example.org --
from the above results.  If there were a thousand domains in a
results, I want all 1000 domains stripped out.
3. save each unique domain names only to a file on my PC.

Obviously, you want to use the Google API if it provides this, and I
assume it does since it has knowledge of the domain name. And this
robot needs to run from a Windows client PC, not a server, so it must
use a client-supported technology -- such as VB, C++, VB script,
javascript, etc. but not Perl, ... I would be open to PHP.

I want to then import that file into Excel or mySQL or MS Access to
research further, for instance, do a whois search for contact info. 
No, I won't call or e-mail them, I just want to conduct market
research on certain things.

I want to be able to also eliminate duplicate domain names, so the
robot must have a memory of names that it has already scraped and
saved.

Request for Question Clarification by pafalafa-ga on 28 Feb 2004 16:50 PST
Hello there author20-ga,

Do you mean an extraction from the results of an "example" search that
look like this (as extracted from the first 100 results):

http://Hedge-Fund-Investing.com&sa=l&ai=AXz7FsNTQAp7k8sojUA-kwLuCFm6tFEu5S3YAmvN7EwgAQ6EAgT9ABMgAYaBw2S4_DAAAAAA&num=2
http://example.org/
http://javascript.internet.com/
http://webquest.sdsu.edu/matrix.html
http://vzone.virgin.net/sizzling.jalfrezi/iniframe.htm
http://web.media.mit.edu/~lieber/PBE/
http://web.media.mit.edu/~lieber/PBE/PBE-Examples.html
http://homepages.enterprise.net/djenkins/ecghome.html
http://nile.wpi.edu/NS/
http://www.xml.com/xml/pub/1999/01/namespaces.html
http://www.xml.com/pub/a/2001/03/28/deviant.html
http://www.amazon.com/exec/obidos/tg/detail/-/0789722429?v=glance
http://www.amazon.com/exec/obidos/tg/detail/-/0321146530?v=glance
http://www.ora.com/catalog/javanut/examples/
http://java.sun.com/security/signExample/
http://java.sun.com/products/jsp/html/jspbasics.fm.html
http://www.muquit.com/muquit/software/Count/Count2_5-ex.html
http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/example-1.html
http://archive.ncsa.uiuc.edu/General/Internet/WWW/LongerCoded.html
http://come.to/example
http://www.usb-by-example.com/
http://www.objectsbydesign.com/projects/xslt/xslt_by_example.html
http://www.testing.com/cgi-bin/blog
http://www.primeexample.com/
http://pages.ebay.com/help/buyerguide/bidding-prxy.html
http://www.htmlbyexample.com/
http://www.phpbuilder.com/columns/dario19990616.php3
http://www.docbook.org/tdg/en/html/example.html
http://www.clarkehome58.freeserve.co.uk/
http://webdev.berber.co.il/
http://www.perl.com/pub/a/2003/10/09/refactoring.html
http://www.modssl.org/example/
http://www-106.ibm.com/developerworks/library/l-awk1.html
http://www-106.ibm.com/developerworks/library/l-bash.html
http://www.springfield.k12.il.us/movie/
http://www.fluffycat.com/sql/
http://static.userland.com/gems/backend/rssMarkPilgrimExample.xml
http://216.239.39.104/search?q=cache:Bszk4Lj4fewJ:static.userland.com/gems/backend/rssMarkPilgrimExample.xml+example&hl=en&ie=UTF-8
http://www.dfanning.com/documents/programs.html
http://mudhole.spodnet.uk.com/~imp/authoring/
http://www.isd77.k12.mn.us/resources/cf/ExmSciProj.html
http://www.cs.uiowa.edu/~jones/step/example.html
http://world.std.com/~franl/crypto/rsa-example.html
http://www.cut-the-knot.org/ctk/August2001.shtml
http://www.geog.ouc.bc.ca/physgeog/contents/4e.html
http://www.intertwingly.net/stories/2002/12/20/sbe.html
http://www.intertwingly.net/wiki/pie/EchoExample
http://www.mtolive.com/phpbook
http://www.1914-1918.net/mgc.htm
http://www.w3schools.com/css/showit.asp?filename=ex1
http://www.cswl.com/whiteppr/tutorials/jini.html
http://www.callihan.com/cssbook/
http://examples.macromedia.com/petmarket/flashstore.html
http://www.ratcliffeblog.com/archives/000057.html
http://www.pixy.cz/blogg/clanky/cssnopreloadrollovers/example.html
http://devdaily.com/unix/edu/examples/index.shtml
http://www.xulplanet.com/tutorials/xultu/window.html
http://www.acceleratedcpp.com/
http://www.thetech.org/exhibits/online/genome/index3.html
http://www.jbc.org/cgi/content/full/274/3/1736/F2
http://www.jbc.org/cgi/content/full/274/15/10154/F3
http://www.glenbrook.k12.il.us/gbssci/phys/projects/q1/tparub.html
http://www.sockaddr.com/ExampleSourceCode.html
http://public.kitware.com/VTK/example-code.php
http://radio.weblogs.com/0114726/2003/07/22.html
http://radio.weblogs.com/0001015/rss.xml
http://216.239.39.104/search?q=cache:SBirhuAS1_kJ:radio.weblogs.com/0001015/rss.xml+example&hl=en&ie=UTF-8
http://www.evl.uic.edu/pape/CAVE/pf/ByExample/
http://www.marchal.com/go/xbe/
http://www.anybrowser.org/campaign/abletters.html
http://research.compaq.com/SRC/trestle_by_example/html/tutorial.html
http://www.boutell.com/wusage/example/
http://www.atomicarchive.com/Example/ExampleStart.shtml
http://www.webreference.com/programming/gently/
http://math.berkeley.edu/~reb/modularforms/
http://www.ecst.csuchico.edu/~chafey/prog/sockets/sinfo1.html
http://www.doc-o-matic.com/sourcecodeexample.html
http://standards.nctm.org/document/eexamples/
http://education.nmsu.edu/webquest/examples.html
http://waltonfeed.com/peoples/navajo/language.html
http://www.w3.org/TR/2000/NOTE-WCAG10-HTML-TECHS-20001106/
http://www.pwcglobal.com/images/pwcerc/pdfs/devbusplexmpl.pdf
http://216.239.39.104/search?q=cache:OOl1G_wmkVoJ:www.pwcglobal.com/images/pwcerc/pdfs/devbusplexmpl.pdf+example&hl=en&ie=UTF-8
http://www.braxtech.com/blt/examplereport.html
http://www.i18nguy.com/unicode/unicode-example-intro.html
http://jakarta.apache.org/tomcat/tomcat-4.1-doc/printer/jndi-datasource-examples-howto.html
http://realgar.mcli.dist.maricopa.edu/alan/jshow/zion/
http://www.resume-template-store.com/
http://www.alistapart.com/d/slidingdoors2/v1/ex9.html
http://rootprompt.org/article.php3?article=832
http://www.all-yours.net/postcard/example.htm
http://www1.umn.edu/ohr/ecep/resume/exfuncti.htm
http://www.snidervillage.com/lead.htm
http://freedom.gmsociety.org/abuse.txt
http://www.netlib.org/scalapack/examples/
http://www.roble.com/docs/secure_solaris.html
http://www.securityfocus.com/columnists/201
http://unu.novajo.ca/simple/archives/000024.html
http://www.cdmag.com/articles/026/178/code.html
http://www.cwru.edu/help/introHTML/examples/TCh3ex3.html
http://www.htmldog.com/articles/suckerfish/example/
http://bbrp.llnl.gov/sequence/definitions.html
http://www.thecortex.net/clover/eg/checkstyle/report/
http://www.artschools.com/antonelli/


==========

If that's what you're after, there is commercial software that will do
the trick.  It extracts URLs, eliminates duplicates, and allows very
flexible exporting to text files, databases, or HTML of URLs only, or
URL + page title + other tidbits of information.

Extradting the above list took mere seconds.

The software can be downloaded and given a trial run at no cost.  The
trial period lasts for 7 days.  After that, to continue using the
software, you need to purchase it, at a cost of about $35.

Do you absolutely need to have a de novo script written for your task,
or will off-the-shelf software work for you?

If the latter, I'll be glad to post the details of the software as an
answer to your question.

Let me know what you think.

Clarification of Question by author20-ga on 28 Feb 2004 18:14 PST
Hi,

I tried to price a script development project, but found a few open
source packages that do the trick.

But no way are they as good as the one you are describing. My budget
for this was to be timed for the delivery of a softwar p;ackage, in 2
to 3 weeks.

I would be glad to pay you $50 in 3 payments over 5 weeks, for the
delivery of the softwar program now.

Or I could give you one payment of $30 now.  I have a perfect payment
record on ebay, and pay on elance promptly (even if some of the
programmers deliver rotten code).

If you opted for 3 payments over 5 weeks, I could give $20 tonight,
$10 in 2 weeks and $10 in five weeks.  I sold all my stuff to complete
a sofotware project and start my company. I'm eating out of garbage
cans in back of McDonalds to launch a major web site, and I need to
target a market by domain.

Only certain market, it is a perfect application for the commercial
tool.  I will pay, just lay it on me now.

Request for Question Clarification by pafalafa-ga on 28 Feb 2004 19:15 PST
Hello again author20-ga,

Thanks for getting back to me so quickly.  It sounds like the software
I mentioned really hit the bullseye, which is great.  But I don't
quite understand what you're asking of me now, though.

If I answer your question now by posting the information, you will be
charged the current price of the question -- $50.  Of course, you
don't actually *pay* it until your credit card bill comes due, which
is presumably some weeks hence.

Do you want me to answer the question as it is currently set-up --
with a $50 price tag????

You seem to have something else in mind, but I'm not sure what. 
There's no way for customers and researchers to communicate with one
another -- or to make/receive a payment -- other than here on the
pages of Google Answers.

Let me know how you would like to proceed on this, and I'll try my
best to accomodate your needs.

Thanks.

pafalafa-ga

Clarification of Question by author20-ga on 02 Mar 2004 10:36 PST
Hi,

My total budget is $50, and if I paid you for this as the answer, I
would have to spend antoher $40 to buy the product.

I am trying to arrange an answer that fits my budget. If I kill this
question and start another for a "Commercial Solution" rather than a
script, I could pay you $20 for this answer by Friday (my revenue is a
bit tight).

Request for Question Clarification by pafalafa-ga on 02 Mar 2004 11:08 PST
We may be in luck.  I've found some freeware that might work for you,
but I need to know your operating system (e.g. Windows 98) first.

Let me know whay you're using, and we'll take it from there.

pafalafa-ga

Clarification of Question by author20-ga on 02 Mar 2004 19:27 PST
Hi,

I am on Windows XP, but would be willing to use another OS if it has a better app. 

I'm after data, and I'll use whatever I need to hunt it and capture it.

Request for Question Clarification by pafalafa-ga on 03 Mar 2004 16:18 PST
Hello again.

Well, it turns out there are some freeware packages to extract URL's,
but I'm not sure they'll do everything you'll need.  The packages
don't come with detailed documentation, and take some playing around
with to really figure them out.  I'm not sure, for instance, whether
they remove duplicates, like the commercial software does.

Tell you what.  Please re-price your question at the amount you think
you can best afford, and I'll post an answer with what I've learned. 
When all is said and done, you may decide you can live with the
freeware, or you may need the commercial software, which you can use
for free for a while anyway, before paying for it to get the full
license.

Clarification of Question by author20-ga on 08 Mar 2004 09:46 PST
OK -- will do.  I think the freeware software will be fine. I will
close this and repost new question.
Answer  
There is no answer at this time.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy