Is there a way to randomly (or psuedo randomly) generate web sites/web
pages using google? Random IP numbers don't work due to dynamic IP.
The purpose is to examine business use of the web to create a typology
of web business sites. |
Request for Question Clarification by
pafalafa-ga
on
09 Sep 2004 11:13 PDT
featherst1-ga,
If I'm understanding your question correctly, then the short answer is
"Yes", there are certainly existing tools for pulling up websites more
or less at random (or at least, pseudo-randomly).
Do you want to know about existing websites you can visit to conduct a
random search? Or are you looking for the specifics to build such a
tool yourself?
Let me know, and I'll see if I can help you on this one.
pafalafa-ga
|
Clarification of Question by
featherst1-ga
on
09 Sep 2004 17:07 PDT
Thanks for the quick response. I would definitely like a list of the
web sites that allow random selection. For this research project,
random selection is important. I understand there are "degrees" of
random selection. Clearly the more attention to the random generation
of web sites the better. As I noted, if the routine creates a pseudo
random selection based on the computer?s clock, that should be fine
for my purpose.
I would also appreciate any info regarding existing tools (i.e.
something I might incorporate into existing php code). I'm not sure
what ?specifics to build the tool myself? means. I'm probably not
inclined to do that, but any information will be greatly appreciated.
Thanks for your assistance. BTW, if you need more info and if its
appropriate to discuss it, I'd be happy to initiate a call.
|
Request for Question Clarification by
pafalafa-ga
on
10 Sep 2004 11:54 PDT
featherst1-ga,
Thanks for getting back to me. The more I think about your request,
the more I find myself wondering if the "random" sites I'm aware of
are random enough for your needs.
The best random site I know of if Mangle:
http://mangle.ca/random.php
Click on the cat (aka Mangle) to go to a random site.
The algorithm for picking sites isn't explained, but from appearances,
it seems that Mangle takes a random selection of words from a
dictionary, conducts a Google search, and then selects a page from the
results to display. It works very smoothly, and brings up a wide
variety of sites.
Some other sites that offer sort-of random searching can be seen in
the Yahoo directory:
http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Searching_the_Web/Search_Engines_and_Directories/Random_Links/
but several of these appear to be so-called random selections from the
website owner's personal collection of idiosyncratic URL's, so you're
not getting a very sizable cross-section of the web.
Let me know what you think of Mangle, and whether or not it meets your needs.
Thanks.
pafalafa-ga
|
Clarification of Question by
featherst1-ga
on
10 Sep 2004 13:38 PDT
I've tried Mangle. Here are the problems I've encountered (all deal
with randomness of selections). One German site, which I believe
lists 10,000 English words, has come up about twenty plus times in 250
selections. Also, few foreign language sites are selected again (I
assume) because of the use of an online English dictionary. Thus
Chinese, Japanese, etc. sites appear especially under-represented.
Finally, the site implies (but doesn't state) there is at least some
effort to minimize representation of sites with inappropriate content
i.e. porn sites.
I'm wondering if the Google data base is enumerated. If it is, then
it might be possible to generate a random number set of, say 1500 or
2000 numbers and select the record numbers from the data base and then
determine the domain names. I'm just thinking the use of online
dictionaries for site selection does not provide the level of random
selectivity I'd like to see. Thanks once again for your assistance.
It?s greatly appreciated.
|