Google Answers Logo
View Question
 
Q: Links to the Invisible Web ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Links to the Invisible Web
Category: Reference, Education and News > General Reference
Asked by: researchbear-ga
List Price: $5.00
Posted: 02 Nov 2004 14:13 PST
Expires: 02 Dec 2004 14:13 PST
Question ID: 423617
I am looking for five links to research resources on the Invisible
Web, also sometimes called "dark matter."

My chief criteria are:

1) The links provided be to sites that are a genuine research resoure
2) The sites should not charge a fee for access or use
3) The sites and/or the research resources they provides should not be
easily discoverable using the Google Web search engine (I'd consider
it a big plus if a site was not indexed by Google at all)

Thanks in advance for your efforts.
Answer  
Subject: Re: Links to the Invisible Web
Answered By: omnivorous-ga on 02 Nov 2004 17:11 PST
Rated:5 out of 5 stars
 
RB --

There are two good Google Answers answers covering the "deep web" or
"dark web".  One's by Luciaphile-GA, who's a professional research
librarian:
"How does Google affect reference librarians?" (Luciaphile-GA, April 12, 2004)
 http://answers.google.com/answers/threadview?id=329159

and one by yours truly, which captured the Berkeley and BrightPlanet
studies that are seminal to understanding it:
"Information Growth," (Omnivorous-GA, March 23, 2004)
http://answers.google.com/answers/threadview?id=319657

Enough on the background.


LINKS TO RESEARCH RESOURCES
----------------------------

These are the kinds of sites that Google Answers researchers collect. 
Most of us have dozens of "Favorites" folders and hundreds of web
links.  Google keeps expanding its reach, so now you can look up the
owner of an airplane by typing in its tail number, where in the past
you had to go to the FAA aircraft database.   Try: N10F -- it is owned
by the King Schools and renowned aviators John & Martha King.

But there are lots of databases that simply can't be toured by robot
because they require some type of CGI or database script.  Also, some
are organized by topics and not easily searchable: the Library of
Congress databases are maddening this way.

Here are my top SIX:

1.  WHERE IS YOUR MONEY?

When overpayments or inactive accounts exist and the original owners
can't be found, most states have laws that require the money and the
account information go to the Secretary of State.  The states then
usually try to find "unclaimed property" owners during about a 5-year
period.

In the middle of 2000, the National Association of Unclaimed Property
Administrators put links to all 50 states databases up on the web. 
You may have to go through a couple of links (and each state is
different in its database structure) but you can find out if money is
owed you.

When I did a search first in 2000, I found money owed a sister and a
sister-in-law but nothing for me.  I repeated the search in 2003 and
found $150 that Dell Computer owed me on a refund, lost after a
relocation 10 years ago:
National Association of Unclaimed Property Administrators
http://www.naupa.org/


2.  DETAILED COMPANY INFORMATION

The Securities & Exchange Commission (SEC) collects an enormous amount
of information on companies in registration statements,  prospectuses,
quarterly earnings reports, annual reports (Forms 10-K).  Since 1995,
much of it is online -- but you have to know how and where to look for
it.  Note that information before 1995 is on microfiche at the SEC,
its regional offices and many Federal Depository libraries:
SEC Edgar Database: Search
http://www.sec.gov/edgar/searchedgar/webusers.htm

From this page, with careful research, you can do such things as
determine the annual income of the Google founders and what facilities
the company has across the world:
http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001288776&owner=include



3.  BASEBALL

There's probably no sport followed with as much statistical detail as
baseball.  One website has done its utmost to get the historical
record online.  Among other details, you'll find box scores for games
over the past 40 years and game summaries all of the way back to 1871:
Retrosheet
http://www.retrosheet.org/


4.  LAND PATENTS

There are dozens of excellent genealogy sites, ranging from state
archives of birth/death/marriage information to Rootsweb to the
fee-based Ancestry.com, which has indexes to census images through
1930.  A little-known one is the Bureau of Land Management, which has
images of land grants from 1820 to 1909.  It's pretty exciting to see
a land patent for great great grandparents -- signed by Andrew
Jackson.  If you want an example, try Catherine Fry, Ashland County,
Ohio.  And make sure that you bring up the digitized image of the
original document!
Bureau of Land Management
http://www.glorecords.blm.gov/PatentSearch/Default.asp?


5.  STATISTICAL ABSTRACTS

The U.S. Census Bureau pages are very deep and not completely indexed
by the Googlebot.  For example, the following search terms don't
provide much useful:
"miles of railroad in operation" 1881

But the Census Bureau has statistical abstracts of the U.S. clear back
to 1878 -- and you can find that precise table in the 1881 Statistical
Abstract.  They're one of the best overall statistical sources on the
changing country:
U.S. Census Bureau
http://www2.census.gov/prod2/statcomp/index.htm


6.  INTERNET ARCHIVE

Google doesn't track the history of web pages, only what's there
currently.  But the Internet Archive's Wayback Machine does.  Try it
for www.google.com and see what the pages looked like in 1998.  The
reason researchers love this tool so much is that it becomes more
valuable every month as a historic reflection of the World Wide Web:
Internet Archive
http://www.waybackmachine.org



SEMI-PRIVATE DATABASES WORTH CONSIDERING
------------------------------------------

I have to mention these because so many public libraries have them
available online at no charge -- though these are all "fee based"
databases.  While an excellent university library might have
Lexis-Nexis, these are commonly available at even "average" public
libraries:

1.  NEW YORK TIMES: indexed and available back to 1851 as part of the
Proquest Historical Newspaper service.  Indispensable to researchers
trying to identify the basics on historical searches.

2.  THOMSON-GALE BIOGRAPHY RESOURCE CENTER: a phenomenal collection of
biographical information, broken down by professions.  I often
supplement this with the old print catalogs of Marquis' Who's Who or
Who's Who in America.

3.  BUSINESS & INDUSTRY RESOURCE CENTER: another Thomson-Gale product
which includes the Investext service.  Investext tracks analysts
reports on companies and industries, giving one a look at the
competitive dynamics of a company or industry.

4.  PEER-REVIEWED LITERATURE: there are actually several services
which track the peer-reviewed (academic & scientific) publications: 
Infotrac, Ebsco and Expanded Academic ASAP.


Finally, there are some excellent interfaces to mapping and satellite
imagery pages.  You can see where you live on a topographic map
anywhere in the U.S.:
http://www.topozone.com/

Or from high-altitude or satellite photos:
http://terraserver.microsoft.com/

Best regards,

Omnivorous-GA

Request for Answer Clarification by researchbear-ga on 02 Nov 2004 20:32 PST
omnivorous -- this is absolutely great material, I look forward to
spending some time digesting it. I particularly find the two
backgrounders by Luciaphile and by you helpful, and look forward to
exploring some of the issues they raise.

No doubt in my mind that this is a 5 star answer -- that said, the
sites you list do not seem to meet my third criteria. That is, they
are all easily findable from the Google Web search engine. Is this
inevitable for a real research resource that is on the Web? Thanks.

Clarification of Answer by omnivorous-ga on 03 Nov 2004 01:41 PST
RE: That is, they are all easily findable from the Google Web search engine.

They might be easily findable (take the Statistical Abstract of the
United States) but they aren't being completely indexed.  I find that
example one of the most-striking because Google can index PDF files
well, but even a site: search of the U.S. Census Bureau site won't
bring up relevant text strings.  This is exactly what the BrightPlanet
white paper notes with 60 websites -- any of which you could consider
adding to your list.

You might consider posting a new question to get at specific sites
that CAN'T be found. (Of course you'll have to remember that the
answer may change month-to-month as the search indexing changes!) 
This is an issue that I've never seen explored.

One way to be certain that Google or another search engine doesn't see
a site would be a "no robots" statement in the robots.txt file.  You
can see it explained with the following strategy:
no robots.txt

You can also hide the website behind a firewall, which is where most
corporate internal databases sit.

A final note, inasmuch as you're asking these questions as part of a
bigger project.  There are several online publications that track
search engines and how they perform.  One in particular that I'd
recommend would be Tara Calashain's "Research Buzz".  She's written
several books on search engines and you might even wish to interview
her at some point:
http://www.researchbuzz.com/

Best regards,

Omnivorous-GA
researchbear-ga rated this answer:5 out of 5 stars
Excellent and thorough answer -- links to the two backgrounders
particularly interesting and helpful for my current project. Thanks!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy