Hello joycen
Sometimes only a partial answer is possible
within the practical constraints of a problem.
Thousands of the amazon affiliates link to the amazon
site. So, instead of giving you the fish, I will just
give you the instructions how to fish:
1) enter the following into the google search engine
http://amazon.com
2) select option 4: Find web pages that link to amazon.com
3) get a spider to crawl through the thousands of resulting links
(links with obidos string in them).
4) modify the spider (usually a perl script) to extract information
you want (title, format ..) from the target page
(page at amazon) and format it the way you want.
5) do not spam the affiliates or anyone else.
Search term forthe above is: reverse URL lookup (or reverse search)
Other search engines then Google do this, but to get
output you want, you would need a customised spider.
Here is an article on the topic:
Reverse Search Inside Out - Part One: Why and How to Search ...
... process is to look at links inbound to a particular site or URL...
http://websearch.about.com/library/weekly/aa061101a.htm
The search terms for the other part are: spider crawler
You will find many spiders, ready to crawl all sites
from a given list. If you do not feel like modyfing spider
yourself, I would suggest posting that task as a separate question,
not a clarification request.
I have noticed there are ga few researches, willing and able
to hack a perl script like this in less then a day,
for some $40 to $75, which in my humble opinion is quite a bargain.
I hope this is useful.
Hedgie |
Clarification of Answer by
hedgie-ga
on
24 Jun 2002 04:15 PDT
OK. We have a good start.
To do step 3 I suggest you first read a bit about the spiders. E.g.
here:
Writing a Web Crawler in the Java Programming Language
http://developer.java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
Spider is a short program, in a language such as perl ot tcl, which
does this:
a) takes URL form a list
b) contacts the server and asks for a page
c) unlike the browser, it will not display the page, it extracts the
links
d) ads them to the list
e) goes back to a)
When the program is doing this, it is said (poetically) that the
spider is crawling the web.
You need a programmer to modify the program (called script in this
case) so that there are
few more step there, namely : d1) extract links with obidos string
and store them
d2) get the amazon
page for each and extract data (title..)
d3) store theese
data in the desired format
That's the 3) and 4). You need a programmer who knows perl (or other
scripting language)
to add these step. It is a simple process (once you know the
language).
Step 5) is just a :-) note, meaning: Once You are done, you have
lists of thousands on
Amazon associates. I hope you are not going to send them unsolicited
e-mail (spam), since
I would not want to be helping anyone to do that.
I am leaving on a trip for few days, if you need more clarification,
please be patient.
A perl progammer may comment and offer to do the modification for
you, as a different
question or you may try elance or other such place to hire a
programmer. I am not a perl hacker.
|