Google Answers Logo
View Question
 
Q: How many pages are on the Web? ( Answered 4 out of 5 stars,   1 Comment )
Question  
Subject: How many pages are on the Web?
Category: Computers > Internet
Asked by: kristensen-ga
List Price: $20.00
Posted: 31 Jan 2005 06:37 PST
Expires: 02 Mar 2005 06:37 PST
Question ID: 466325
In 2001 it was concluded that the Web consisted of 550 billion
individual Web pages.

See Complete Planet's Deep Web FAQ:
http://aip.completeplanet.com/aip-engines/help/help_deepwebfaqs.jsp#Anchor_dwfaq6

What is the current estimate?

Request for Question Clarification by omnivorous-ga on 31 Jan 2005 11:45 PST
Kristensen --

I'm very familiar with the BrightPlanet research: it's very good.  

Some of the best studies really only have data to 2003 -- is that
recent enough for you?  Note that the citations and detail are
excellent in this case.

Best regards,

Omnivorous-GA

Clarification of Question by kristensen-ga on 31 Jan 2005 13:18 PST
If 2003 data is the newest available data, that is definitely better
than 2003, so yes - that is recent enough. Thanks.
Answer  
Subject: Re: How many pages are on the Web?
Answered By: omnivorous-ga on 31 Jan 2005 15:02 PST
Rated:4 out of 5 stars
 
Kristensen ?

The Bright Planet ?Deep Web? white paper is still one of the best
investigations of the web.  You linked the FAQ and the actual 2001
white paper by Michael K. Bergman is here:

Bright Planet
?Deep Web White Paper,? (Berman, July 2001) 
http://www.brightplanet.com/technology/deepweb.asp

It?s important because of the contentions regarding the web,
specifically that the ?deep web? is 400 to 550 times larger than the
public web.  It consists of databases that are not searchable by a
searchbot, including public databases with a CGI interface; private
Intranet pages; proprietary databases such as Lexis/Nexis or the
Thomson Gale databases; and pages that block searchbots.

Note here that in mid-2003, Google itself estimated that it was
reaching about 50% of the 4 billion pages on the Internet.  Then, in
February 2004, Google was reaching 4.28 billion web pages (and with
images and message boards indexing 6 billion items):

Google, Inc.
?Google Achieves Search Milestone,? (Feb. 17, 2004)
://www.google.com/press/pressrel/6billion.html

The copyright notice on today?s Google home page now claims
8,058,044,651 web pages.

But we don?t know if Google has increased the percentage of web pages
indexed or not.

---

Probably one of the most-exhaustive studies done to assess the amount
of existing information is a study titled ?How Much Information?? done
at University of California at Berkeley.  The good aspect of this
study is that it was done in 2000, then again in 2003.  Also, it
attempts to measure the TOTAL amount of information, including that on
paper in library stacks and in other areas where search engines are
starting to make penetration ? like TV, film and other recorded
electronic media.

Between the 2 studies, the size of the public web went from the 14-28
terabyte range in 2000 (a terabyte is 1 million megabytes) to 167
terabytes in 2003.  That was a growth of 6x to 12x in size.  Their
estimate of the ?deep web? was the same figure used by Bright Planet ?
400 to 550 times larger.

Both studies use the same average web page size of 18.7K bytes, taken
from a Nature Magazine article that had studied the statistical
average of web sizes back in 1999.  Though I?m skeptical that web page
sizes have remained constant, because of readability it?s unlikely
that they?ve grown to double or triple their size, so we can do some
estimates of web page growth from the ?How Much Information?? study:

Low-end 2000 (14 terabytes or 14 x 10^12 bytes): 749 million pages 
High-end 2000 (28 terabytes): 1.50 billion pages

2003 estimate (167 terabytes): 8.93 billion pages

Note that there are some strong differences here between the data
accumulated by Bright Planet and the Berkeley study.  For example,
Bright Planet claims that ?Sixty of the largest Deep Web sites
collectively contain about 750 terabytes of information.?  But Bright
Planet is also using all documents and document types, including
images, which the Berkeley studies seem to exclude.  The Berkeley
studies break the data types (e-mail, blogs, spam, web pages, web
images) down into great detail.

Here are links to the 2 studies:
UCal Berkeley
"How Much Information?" (2000)
http://www.sims.berkeley.edu/research/projects/how-much-info/

UCal Berkeley
"How Much Information?" (2003)
http://www.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htm

---

What is safe to conclude from all of this?  
?	that digitization of information is growing rapidly
?	that it is highly segmented, including non-searchable and proprietary databases
?	that it?s difficult to measure because of the combination of image
and page objects.  Here?s an example ? are these two different pages
or really just one?

Mooney Owners Poll
http://www.mooneyevents.com/polls.htm

Are You Afraid of Heights?
http://www.mooneyevents.com/GIFs/Sep15.gif

?	that search engine reach is growing but the creation of electronic
content is growing faster
?	that a 6X-12X growth in web pages during the 2000-2003 period is reasonable
?	that we need another study soon


Best regards,

Omnivorous-GA
kristensen-ga rated this answer:4 out of 5 stars and gave an additional tip of: $10.00
Thorough answer with lots of extra information. However, the research
quoted was 1½-2 years old.

Comments  
Subject: Re: How many pages are on the Web?
From: guzzi-ga on 31 Jan 2005 17:16 PST
 
I typed in ?a? and got 8,000,000,000 hits. Didn?t download them all to check.

Best

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy