Google Answers Logo
View Question
 
Q: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers ( Answered,   3 Comments )
Question  
Subject: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers
Category: Computers
Asked by: khill_cooter-ga
List Price: $200.00
Posted: 26 Jun 2002 12:50 PDT
Expires: 26 Jul 2002 12:50 PDT
Question ID: 33696
I have a multi-part question that will need some rather specific
answers.

Question (1):  Please define in a non-technical manner (layman’s
terms) the following terms (hereinafter “fields”) as they relate to
the Internet and research:

AI (Artificial Intelligence):

Bots:

Spiders:

Crawlers:

Trawlers:

Ferrets:

Fetchers:  


Question (2):  What are the differences and similarities between the
above fields? (once again, non-technical, layman’s terms)

  
Question (3): Who are the top ten U.S. based individual experts in
each “field”? (Academic, Think Tank or Government based)

Question (4):  Who are the top ten companies in each “field”?   

Question (5): What are the top ten commercial product offerings which
combine in a turnkey business application all of the above data
mining/information retrieval fields with effective: (A) Filters, (B)
Knowledge Management categorization middleware, and (4) Non-technical
employee user friendliness.

Question (6)  

(A)	Please provide me with (as comprehensive as possible) contact
information for the people listed below:

(B)	As they relate to the fields defined in Question (1) above, please
notate next to each person’s name listed below what their specific
areas of expertise are:
Vasant Honavar
Pattie Maes
Keith S. Decker
Milind Tambe
Erik Selberg
Oren Zamir
Oren Etzioni
Richard Segal
Marvin Minsky
Charles Petrie
Sabit Kraus
Craig Knoblock
Steve Minton
Marcus P. Zillman

If you have any questions or need more specific examples, please let
me know.
Answer  
Subject: Re: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers
Answered By: aditya2k-ga on 26 Jun 2002 16:39 PDT
 
Hi khill_cooter,


   Good day. Wow! That is quite an impressive list of questions that
required some work to be done, but the price matched it. As someone
said - "Where there's a will, there's a way"

   Coming to the answer, I'm sure I can be of assistance since I am
associated with the above topics, and I shall define the terms in my
own words, rather than post a link.

AI
--
Artificial Intelligence : It is the branch of computer science that
deals with making computers behave like humans. In other words, making
computers think. Today, results obtained by computers depend on what
the user sends to it (input). This term was coined by John McCarthy at
the Massachusetts Institute of Technology in 1956. This includes
gaming (eg. human vs. computer chess games), and expert systems(eg.
helping doctors diagnose a problem). A good site on AI :

Artificial Intelligence Repository
http://library.thinkquest.org/18242/index.shtml (old Layout)
http://www.generation5.org/ (new Layout)


Bots
----
A bot is short for robot, a computer program that runs automatically.
Once the human configures the bot and runs it, no further human
intervention is required, except for troubleshooting or any such
problems. A spider (defined below) is a type of bot. Bots incorporate
a bit of AI.


Spider
------
A spider is a program that automatically fetches web pages. It visits
web pages, either through user submission, or links from another page.
This process continues until all the pages linked from all pages are
visited. This process is known as crawling. Another term for these
programs is webcrawler. Search engines like Google use a large number
of spiders working in parallel (ie. many spiders crawling different
pages)


Crawler
-------
A crawler is another name for a spider. I've confirmed this through a
number of internet sources.


Trawler
-------
A trawler is a program that sifts through large volumes of data (eg.,
other search engines, Newsgroups, or FTP archives) looking for
something of interest. Examples include http://www.search.com which
searches search engines (also known as meta-searching).
http://groups.google.com searches through newsgroup postings.


Ferret
------
A ferret is nothing but a trawler. This is not a technical term. It is
a term devised by FerretSoft for it's products. An example of which is
WebFerret, an offline search utility that allows you to formulate your
query offline then, when you connect, it searches the web until it has
collected the number of references you have specified. WebFerret
queries large web search engines to find sites matching your keywords.
It queries all configured search engines simultaneously and discards
duplicate results. URLs that are found can be visited immediately even
as WebFerret continues to run. New or updated search engines are added
automatically to WebFerret as they become available.


Fetcher
-------
A fetcher is an informal term used for a program that fetches anything
from the internet. It could be fetching of web pages for offline
viewing. It could be fetching of e-mail from your e-mail server (eg.
MS Outlook Express). It could also be fetching files from other users
(p2p or peer-to-peer), and example of which is Kazaa or Napster.


Answer (2)

Artificial Intelligence is a technology, and not software like the
others. Bots are programs that can perform any automated function, not
necessarily related to searching. A spider and crawler are one and the
same thing. It visits pages and indexes it (stores details like the
title of the page, keywords, and a cache copy). A trawler can be
considered to be a spider, except that it doesn't visit and index web
pages. Instead, it crawls through data already indexed in any form. A
ferret is a commercial name used by a company for a trawler. A fetcher
fetches data from the internet. A fetcher returns data purely based on
what the user seeks.
In a nutshell, you can group the above as follows
*AI
*Bots
 -Spiders or Crawlers
 -Trawlers or Ferrets
*Fetchers


Answers (3,4 & 5)

There is no definitive system that ranks the top 10 people or
companies in the fields mentioned above. However, the following links
give information about people who have made a significant contribution
in this regard.

AI People
http://directory.google.com/Top/Computers/Artificial_Intelligence/People/
This is a directory of eminent people in the field of AI, which
includes all the fields mentioned above.

People in AI
http://www.aiguru.com/people.htm

People in AI
http://www.csd.uch.gr/~halkidis/people.html

Pioneers in AI
http://216.239.33.100/search?q=cache:v1ckc9AkwuoC:www.aistudy.co.kr/pioneers.htm+pioneers+%2Bin+%22Artificial+Intelligence%22&hl=en&ie=UTF-8
Note: The original page has been removed. However, google's cache
still has it.


The following search engines use the best spidering technology :

Altavista
http://www.altavista.com

Google
://www.google.com

Metacrawler
http://www.metacrawler.com

MSN Search
http://search.msn.com


The top companies for turnkey business solutions in data mining are

IBM alphaWorks
http://www.alphaworks.ibm.com/
Place where one can find the latest technologies from IBM Research.

Visual Numerics
http://www.vni.com/index.html
The leading provider of visualization, mathematics, analysis and
network software solutions including PV-WAVE, JWAVE, IMSL and JNL.

Spotfire
http://www.ivee.com/
Spotfire's modular suite of products delivers immediate value to
research scientists engaged in discovery. At the departmental level,
Spotfire products can help you extract greater value from investments
you've already made in data generation.

Visual InsightsŪ ADVIZOR
http://www.visualinsights.com/
Interactive data visualization software enabling faster business
decisions: linked ActiveX components and complete applications for
business intelligence and customer behavior analysis.

Insightful Corporation
http://www.insightful.com
Insightful Corporation is a leading supplier of software (S-Plus) and
services for statistical data mining, business analytics and
information retrieval enabling clients to gain intelligence from data.

Acxiom Corporation
http://www.acxiom.com/
Provides a comprehensive range of information services and products
that allow businesses to make informed marketing, merchandising, and
risk management decisions.

Partek Inc. - Pattern Recognition Software
http://www.partek.com
Statistical and visual data analysis software. Widely used in life
sciences and engineering for gene expression (microarray) data
analysis, high throughput screening, and drug design including SAR and
ADME prediction.

There are some public domain software available as well :

WinMine Toolkit Home Page -
http://research.microsoft.com/~dmax/winmine/tooldoc.htm
By David Chickering at Microsoft Research. The WinMine Toolkit is a
set of tools for Windows 2000/NT/XP that allow you to build
statistical models from data. The majority of the tools are
command-line executables that can be run in scripts.

CART - Salford Systems
http://www.salford-systems.com/products-cart.html
A decision tree tool that automatically sifts large, complex
databases, searching for and isolating significant patterns and
relationships. Offers free limited capability demo for download,
product features, applications, user feedback, and associated books.

Machine Learning Library in C++
http://www.sgi.com/tech/mlc/
MLC++ is a standard C++ library for supervised machine learning, with
back-end and front-end tools for data mining tasks like Decision
Trees, and Clustering. Information on legal issues, mailing lists,
history, standards, platform support, and download instructions.

XmdvTool Home Page
http://davis.wpi.edu/~xmdv/
A public-domain software package for the interactive visual
exploration of multivariate data sets. It is available on all UNIX
platforms which support XR4 or higher. The current version of the
software (3.1) supports scatterplots, star glyphs, parallel
coordinates, and dimensional stacking.

AutoClass C
http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/autoclass-c-program.html
An unsupervised Bayesian classification system that seeks a maximum
posterior probability classification.

StatLib - XlispStat Archive
http://lib.stat.cmu.edu/xlispstat/
Environment for statistical computing and dynamic graphics based on
Lisp. Contains contributed code and submission instructions.

Solutions for bots, spiders, crawlers, trawlers, ferrets, fetchers :

Google Search Solutions
://www.google.com/services/
Hosted search options and a search hardware product.

ht://Dig
http://www.htdig.org/
A complete indexing and searching system for a small domain or
intranet. Source code provided under the GPL.

Glimpse
http://glimpse.cs.arizona.edu/
UNIX Search Engine for searching entire file systems.

Thunderstone
http://www.thunderstone.com/
Provides SQL-based relational full-text retrieval, dynamic publishing,
object management, and web-indexing software.

Inktomi Search Engine
http://www.inktomi.com/products/search/index.html
Service provides searching in hosted clusters for specific domains and
web sites.

WebGlimpse
http://webglimpse.net/
Finds information in a related web of pages. Collects and indexes
pages based on traversal of links or subdirectories. Create a
context-sensitive search by category by linking to relevant pages.

Fast
http://web.fast.no
Search and personalization software optimised for multimedia.

Mondosoft
http://www.mondosoft.com
Provides installed and hosted site search software for web sites and
intranets.

Fluid Dynamics Search Engine
http://www.xav.com/scripts/search/
Written in Perl. Online-manageable with a web browser.

Master.com
http://www.master.com
Provides free private-label hosted search and application services for
web sites.

CareerCast
http://www.careercast.com
Develops web-based job and resume search software for career,
newspaper and employer sites.

ASPSeek
http://www.aspseek.org
Indexes as many as a few million URLs and searches for words and
phrases. Uses wildcards and Boolean operators. SWSoft released ASPSeek
under the GNU GPL.


Answer (6)

The contact details of the individuals is listed below. Their area of
expertise can be determined form their contact itself (eg. Artificial
Intellgience for Dr. Vasant Honovar)

Dr. Vasant Honovar
Professor
Artificial Intelligence Research Laboratory
Department of Computer Science
210 Atanasoff Hall
Iowa State University
Ames, Iowa 50011-1040
voice: (515) 294-1098
fax: (515) 294-0258
email: honavar@cs.iastate.edu
web : http://www.cs.iastate.edu/~honavar/homepage.html

Dr. Pattie Maes
MIT Media Laboratory
Room E15-305B
20 Ames Street
Cambridge, MA 02139
U.S.A.
+1-617-253-7442 [Voice]
+1-617-253-6215 [Fax]
+1-617-258-6264 [Alt. Fax]
pattie@media.mit.edu
http://pattie.www.media.mit.edu/people/pattie/
Areas of expertise : Artificial Intelligence, Human Computer
Interaction, Computer Supported Collaborative Work, Information
Filtering and Electronic Commerce

Dr. Keith S. Decker
Associate Professor
Dept. of Computer and Information Sciences
University of Delaware
77 E. Delaware Ave. (the AI/NLP GreenHouse)
Newark, DE 19716-2586
(302) 831-1959 (office)
(302) 831-4091 (fax)
decker@cis.udel.edu
http://www.cis.udel.edu/~decker/
Areas : Distributed AI and Multi-Agent Systems

Dr. Milind Tambe
Associate Professor,
University of Southern California
Computer Science Dept
Henry Salvatori Computer Center
232, Los Angeles,
CA 90089-0781,
Tel: 213-740-6447
Fax: 213-740-7285,
email: tambe@usc.edu
web : http://www.isi.edu/soar/tambe/
Areas : Multi-agents, distributed AI, TEAMWORK in multi-agent systems,
Adjustable autonomy, multi-agent collaboration, agent modeling, plan
recognition, intelligent agents in sythetic environments, constraint
satisfaction, rule-based systems, production match

Dr. Erik Warren Selberg
Home: 4815 36th Ave. NE
Seattle, WA 98115
(206) 517-3039
(206) 915-1472 (cell + voicemail)
erik@selberg.com
http://www.selberg.org/homes/speed/home.html
Areas : Search Services. Implemented MetaCrawler, one of the first
World Wide Web meta search services.

Dr. Oren Zamir
Unable to get contact information. But, if you contact Dr. Selberg or
Dr. Etzioni, they would assist you, as Dr. Zamir also worked on the
MetaCrawler project

Dr. Oren Etzioni
Associate Professor
University of Washington
Computer Science & Engineering
Box 352350
Seattle, WA 98195-2350
Office: 209 Sieg Hall
Phone: (206) 685-3035
Fax: (206) 543-2969
Email: etzioni@cs.washington.edu
http://www.cs.washington.edu/homes/etzioni/
Areas : Artificial Intelligence and Information Retrieval, for making
the Web easier to navigate

Dr. Richard Segal
IBM Thomas J. Watson Research Center
PO Box 704, Room H2-K20
Yorktown Heights, NY 10598
rsegal@watson.ibm.com
http://www.research.ibm.com/people/r/rsegal/
Areas : Bots

Dr. Marvin Minsky
MIT Media Lab and MIT AI Lab
Toshiba Professor of Media Arts and Sciences
Professor of E.E. and C.S., M.I.T
minsky@media.mit.edu
http://web.media.mit.edu/~minsky/
Areas : AI, cognitive psychology, mathematics, computational
linguistics, robotics, and optics

Dr. Charles Petrie
Executive Director
Stanford Networking Research Center (SNRC)
<petrie@stanford.edu>
http://nrc.stanford.edu/~petrie/home.html
Areas : Distributed process coordination

Sabit Kraus
No information available. Maybe you got the name misspelled.

Dr. Craig Knoblock
Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, CA 90292
Email: knoblock @ isi.edu
Voice: (310) 448-8786
Fax: (310) 822-0751
http://www.isi.edu/~knoblock/index.html
Areas : Artificial Intelligence and Information Integration

Dr. Steven Minton
Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, CA 90292
Email: Minton@isi.edu
Voice: (310) 822-1511 x275
Fax: (310) 822-0751
http://www.isi.edu/sims/minton/homepage.html
Areas : Artificial Intelligence, especially machine learning,
planning, scheduling, constraint-based reasoning and program
synthesis.

Marcus Zillman
CEO, Bottechnology.com
Executive Producer / Host Internet-101.com
Author of the Internet MiniGuides
(941) 434-5113
Email Address: zillman@internetminiguides.com
Area : Bot technology


I've probably covered every aspect of your questions. Please don't
hesitate to ask for a clarification if you have one. Honestly, I must
admit that I've enjoyed answering this question, since it lies within
my domain of interest.

Have a good day

Cheers,
aditya2k.

Request for Answer Clarification by khill_cooter-ga on 27 Jun 2002 08:39 PDT
Adity2k,

Thanks for the extremely great answers thus far.  Your level of
conciseness is exactly what I am looking for on all except (3) and
(4).  Let me better define what I am looking for to achieve my end
results.

Answer (3):  You did a good job of providing the names of “AI” people,
but I am also looking for experts in “spiders” and  “trawlers” as
well.    Based on your answer of question (2), the field “bots” would
not need to be researched under this question, but my partner feels it
should.  Are there experts in just “bots” or do they specialize in the
different fields?  The reasoning behind all the questions is my
partner and I are looking at starting a business that would directly
involve these areas and we will potentially be flying to meet with
some of these experts.  It is very critical for my time and money’s
sake, that I get your expert research opinion on the top people in
these fields.  Could you provide me with the same type of contact
information as you did on answer (6)?

Answer (4):  You have shown me some companies with “Spidering”
technologies, but once again I am also looking for the use of “AI” and
“Trawlers”.    Again, I have done research on my own, but I am looking
for your expert research opinion.

Also, many thanks to insideinfo-ga for your input on my questions.  I
wish to give you a good rating as well.

Many Thanks

khill_cooter

Clarification of Answer by aditya2k-ga on 27 Jun 2002 09:57 PDT
Hi khill_cooter,


   Thanks for your words of praise. Since you are planning to fly out
to meet the experts, I'm going to provide the contact information of,
may I use the word 'geniuses', in this field. Bots cover a wide range
of topics. It is not possible to specialize on bots on the whole.
People specialize in certain areas where bots are of assistance. As
mentioned in my answer, a spider is a bot. Trawlers, ferrets, and
fetchers are either extensions or modifications of spiders. An expert
in spiders is definitely an expert in these other technologies. The
list of people mentioned above are eminent enough. However, I'll
provide some more.

   Also, thanks to insideinfo-ga for his input.

Dr. Jakob Nielsen
"The Guru of Web Page Usability" (New York Times)
Dr. Jakob Nielsen
Nielsen Norman Group
48921 Warm Springs Blvd.
Fremont, CA 94539-7767
USA
Email: nielsen@nngroup.com
Office: Luice Hwang, hwang@nngroup.com, tel. (408) 720-8808
http://www.useit.com/jakob/index.html

Sergey Brin,
Escondido Village #22D
Stanford, CA 94305
Phone : +1-415-497-0753
Fax +1-415-725-7411 or +1-415-725-2588
Sergey Brin is a co-founder of Google

Rajeev Motwani
Department of Computer Science
Room 474
Gates Computer Science Building 4B
Stanford University
Stanford, CA 94305-9045
Phones: 650-723-6045 (office), 650-725-4671 (fax)
rajeev@CS.Stanford.EDU
http://theory.stanford.edu/~rajeev/

Doug Young
Chief Technology Officer, AltaVista Internet
AltaVista Company
http://www.altavista.com

Keith Golden
Autonomy and Robotics Area
Computational Sciences Division
NASA Ames Research Center
vox: 650-604-3585
fax: 650-604-3594
kgolden@ptolemy.arc.nasa.gov
http://ic.arc.nasa.gov/people/kgolden/

Dr. Neal Lesh
MERL Cambridge Research
Research Scientist
Phone:  (617) 621-7583
E-mail:  lesh@merl.com
http://www.merl.com/people/lesh/


Companies which specialize in AI,

Online Speech Engines:
KurzweilAI.net
http://www.kurzweilai.net
Creator of the Ramona speech engine (chatbot), but actually has an
eclectic interest in a large number of different AI technologies. Note
that for Kurzweil, AI means "Accelerating Intelligence". This site is
definitely worth checking out!



AI Consulting Companies:

SHAI
http://www.shai.com
AI consulting, especially to government agencies

Rule Based Engines:
Blaze Advisor Rules Engine
http://www.blazesoft.com/

ILOG JRules
http://www.ilog.com/
Java based

ART*Enterprise/Mindbox
http://www.mindbox.com/
An expanding domain independant rules based technology. Applications
include: Financial Services, Contract Management, Expert Systems, etc.

Jess
http://herzberg.ca.sandia.gov/jess
Free for non-commercial applications




Web Based Agents:

Extempo Systems
http://www.extempo.com/
Natural language communication on the Internet

Brightware
http://www.brightware.com/
Automated response and advice to email; now acquired by Firepond
http://www.firepond.com

Kana
http://www.kana.com/solutions/prodline/classify/index.asp
Their Kana Classify product provides automated response and advice to
email

Large Scale Knowledge Bases :
Cyc
http://www.cyc.com


Directory of AI Companies
http://www.iit.nrc.ca/ai_companies.html


Some of the Trawler companies :

Google Groups
http://groups.google.com

Moreover News Technologies
http://www.moreover.com

Search.com
http://www.search.com

Metacrawler
http://www.metacrawler.com

I hope I have clarified your request. If anything further is to be
clarified, please don't hesitate to ask.

Cheers,
aditya2k
Comments  
Subject: Re: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers
From: insideinfo-ga on 27 Jun 2002 05:16 PDT
 
I have found some info on Oren Zamir:

On this page:

http://www.cs.washington.edu/homes/etzioni/

Another of your notable experts.

He worked with the Phd student Oren Zamir and associate professor
Etzioni wrote of him:

Dr. Oren Zamir (1999, Openratings). Oren's dissertation, Clustering
Web Documents: A Phrase-Based Method for Grouping Search Engine
Results, investigated the use of a novel and fast clustering algorithm
to group the results of Web search engines into easily-browsed
clusters. The most distinctive aspect of the algorithm was its
treatment of documents as strings of words, represented by a suffix
tree, in contrast with the standard vector-based representation.

Oren Zamir recieved a Phd in 1999 and you can find his dissertation
at:

http://www.cs.washington.edu/research/projects/WebWare1/www/metacrawler/thesis.zip

In that document I found his US and Israel contact info:

U.S. address: CS Department, University of Washington, Box 352350,
Seattle, WA 98195-2350, USA
Israel address: 22 Hatana'im st. apt. 23, Ramat-Aviv, Tel-Aviv 69209,
Israel;
zamir@cs.washington.edu;
http://www.cs.washington.edu/homes/zamir

His web page is no longer working and the physical address on campus
may no longer receive his mail, I might try the Israel address or
contact his fellow students or professors and ask of current contacts.
He worked with Oren Etzioni several times as you can see here:

http://www.cs.ualberta.ca/~wum/pages/paper2.htm

Good Luck
Subject: Re: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers
From: insideinfo-ga on 27 Jun 2002 05:29 PDT
 
I was able to find some info on another notable resercher - Sarit
Kraus. You had that first name as Sabit which threw off the answerer
aditya2k-ga. I assumed that the last name was right and was able to
find it. I know several people with last name of kraus and could not
think of another way to spell that name. She has a home page at:

http://www.cs.biu.ac.il/~sarit/

And her contact info is: 

Dept. of Computer Science Bar-Ilan University
Ramat Gan, 52900 Israel 

 Office: Room 305 Math Building 
 Dept. of Computer Science Bar-Ilan University
Ramat Gan, 52900 Israel 

Office: Room 305 Math Building 
Phone: (972) 3-531-8762 
Fax: (972) 3-535-3325 
E-mail: sarit@cs.biu.ac.il 
 E-mail: sarit@cs.biu.ac.il
Subject: Re: AI-Bots-Crawlers-Spiders-Trawlers-Ferrets-Fetchers
From: panos-ga on 23 Apr 2004 10:40 PDT
 
You might want to know that Oren Zamir is currently working for Google
in their New York offices.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy