Will_Fawcett --
A very interesting question which has generated enormous debate.
After a period of explosive growth between 1995-2000, there are those
who argue that growth of "the Internet," defined as PUBLIC web page
sites, has slowed to zero. In the O'Neill/Lavoie/Bennett's article in
D-Lib Magazine in April, 2003, they actually argue that the Internet
may have shrunk between 2001-2002:
D-Lib Magazine
"Trends in the Evolution of the Public Web" (April, 2003)
http://www.dlib.org/dlib/april03/lavoie/04lavoie.html
The three authors work for the Online Computer Library Center, which
began to size the Internet in 1997. They estimated that by June, 2002
there were 3,080,000 websites in the "public" web or 35% of the total
of 1.4 billion pages. They use estimates from Shapiro and Varian in
their book "Information Rules" A Strategic Guide to the Network
Economy" to estimated that it's the equivalent of about 1.5 million
books -- or a fraction of a good university library.
"Information Rules" (Shapiro & Varian)
http://www.inforules.com/
Indeed Berkeley has some of the best information, particularly the
study "How Much Information?" which did a 2000 and 2003 study,
indicates that the web is growing dramatically -- in contrast to the
D-Lib article. The interesting aspect of the Berkeley study is that
it also attempts to measure the TOTAL amount of information available
-- including paper:
Berkeley School of Information Management Systems
http://www.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htm
In their 2000 study, the estimates were:
Public web: 14-28 terabytes
Total web including "deep web": 25-50 terabytes
Average web site: 441 pages
Average page size: 10K-20K bytes
In the 2003 study, the estimates are far higher for both the public
web and private web, probably because better statistical sampling
techniques were used AND continued growth of information available via
the web:
Public web: 167 terabytes
Total web including "deep web": 66,800-91,850 terabytes
Here are links to the 2 studies:
UCal Berkeley
"How Much Information?" (2000)
http://www.sims.berkeley.edu/research/projects/how-much-info/
UCal Berkeley
"How Much Information?" (2003)
http://www.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htm
BrightPlanet did the core work on the "deep web" and its size, which
put the remote portion of the web at 400-550 times larger than the
public web. The "deep web" includes all of the pages that are not
searchable, including public databases with a CGI interface; private
Internet pages run by companies; proprietary databases like
Lexis/Nexis or Thomson Gale databases; and pages that block search
robots for various reasons:
DeepPlanet
"Deep Web White Paper," (Bergman, July 2001)
http://www.brightplanet.com/technology/deepweb.asp
It's pretty clear that the growth -- and growth by category -- is
pretty dramatic. Only new domain registrations show a slowdown,
mentioned in the D-Lib article. But intranets within corporations are
growing rapidly, with many corporations going to electronic
documentation. Yet we may never have good estimates of growth because
early measures missed so many areas.
And search engines are struggling to penetrate these recesses of the
Internet in order to retain their value as the heartbeat (or perhaps
the "brains" is more accurate) of the Internet. Eighteen months ago
Google estimated that it reached about half of the 4 billion pages on
the Internet. A recent Google press release now claims about 4.28
billion web pages (with reach into images and message boards expanding
the total index to 6 billion items):
Google, Inc.
"Google Achieves Search Milestone," (Feb. 17, 2004)
://www.google.com/press/pressrel/6billion.html
Google search strategy:
"size of the Internet"
"limits of the Internet"
Best regards,
Omnivorous-GA |