There are lots of way to do this. Output from websites comes as the
response from a request to that website. You can post particular
requests to target websites. You can also see what the response is via
a web application and then read that output and manipulate it as you
choose.
The best way I can suggest (and explain) involves PHP. Most scripting
languages (ASP, PHP, Perl) can include pages and code elements stored
as other files or scripts. The additional quality that PHP has is the
ability to include URLs as well as read in URLs and parse them for
their input.
For example :
$URL = "www.google.ca/search?q=DeWolfe&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google+Search&meta=";
$aupair = "http://".$URL;
$file = fopen ($aupair, "r");
if ($file)
{
while (!feof ($file))
{
$line = fgets ($file, 1024);
$linez = $linez."\n".$line;
}
fclose($file);
}
return $linez;
This chunk of PHP code gets a URL stored in "$aupair" The next line
reads this URL with the fopen (for file open). While the file is open
(ie. more lines can come in), it continues to read the URL into
memory, storing it in the "$linez" variable.
Once in memory, the URL can be parsed and read for any number of
keywords and values. Using Regular Expressions, you can "cook down"
the data you get for useful infomation. You can also use functions
like PHP's "striphtml" to strip out the HTML and leave only the text
results.
When asking for a URL you can pass ANY request via PHP. Tack on a
query string and you can pass values to this URL. That URL may then do
something and return a special result. You can read that result and do
with it what you choose.
If the URL you're reading is full of links, you may want to follow
each of those links and read content on those pages, in essense
creating a web spider like Google or WebCrawler.
If you were using an automated system like this to post material to
multiple sites, you may want to see what those URLs have to say about
the material they recieved (Did they not understand the query string?
Is the URL you were looking for now missing?)
All of the web scripting languages offer a means of drawing in content
from remote sites: PHP, Perl, ASP, Tcl, JavaScript, etc.. Some of them
(like PHP) can do it easier that others.
A good book that goes into this subject in great detail is :
http://www.amazon.com/exec/obidos/tg/detail/-/B00005R09X/qid=1043560822/sr=1-3/ref=sr_1_3/002-1890857-1634408?v=glance&s=books
I have a book coming out in March that does a lighter coverage of this
subject but displays how to do it in multiple languages :
http://vig.prenhall.com/catalog/academic/product/1,4096,0130461830,00.html
In answer to your patent question, I don't believe anyone has a
patented means of retrieving data. There are multiple ways and each of
which can be implemented in a slightly different way. |
Request for Answer Clarification by
martinjay-ga
on
25 Jan 2003 22:21 PST
I am impressed with how well you know this stuff, but you are
way too techie for me. Looking for a non-tech answer to how
this works, simple not a ton of detail. Also, need the examples
if you know them. Looking for someone who really knows
this area from a business end. Example:
I type in my Name, Address, Job History etc. to a site,
then somehow this site posts that on to other job sites.
This is what I am after, and I apologize for the confusion.
|
Clarification of Answer by
dewolfe001-ga
on
26 Jan 2003 10:19 PST
Type in your Name, Address, Job History etc. to a site. Site A takes
the form information and sends out web page requests and/or automated
form posting to a number of other sites (call them sites B, C and D).
The programmer for Site A has gone to sites B, C and D. He has looked
at what data the other sites need, then built his site to connect to
these other sites and provide them with what they need. Sites B, C and
D will then respond with their web pages. Site A will read what those
sites said (e.g. "Success" "Fail", etc.).
Site A usually doesn't have explicit permission to throw stuff on
Sites B, C and D. From a legal standpoint, they are extending your
permission to their actions and acting like your agent. A lot of these
potential "third party" sites have a problem with this. A few sites
(like the members' section of Lycos) will add a randomly created GIF
of characters that you have to retype into a form field. Because of
how they make these GIFs, they're virtually machine unreadable so an
automated process can't take part, making these automated postings
nearly impossible.
There are a lot of sites out there that spread your information in
your interest. There are a number that scoop information from other
sites to add content to their own site.
I hope this answers your question.
|