Hi frediii-ga, and thanks for your question.
One of the best estimates of the number of total websites comes from a
recent Netcraft survey:
http://news.netcraft.com/archives/2005/06/01/june_2005_web_server_survey.html
As of June, 2005, there were approximately 64,808,485 sites. The
November, 2005 estimate was 74,572,794. The above report also gives
quite a bit of additional information, such as growth rates, market
share for various web server software (Apache is the big leader as you
might guess), operating systems, etc.
Here is their most recent web server survey:
http://news.netcraft.com/archives/web_server_survey.html
You can find also find information, for example, not only about how
prevalent Linux servers are, but also which distributions are the most
popular and which are gaining ground the fastest:
http://news.netcraft.com/archives/around_the_net.html
______________
In terms of the number of pages per website, Hewlett-Packard studied
this question in 2001. The authors of the article below found a power
law distribution of pages per site, the number of users who visit a
given site, and the number of links point to and from a given site.
For the fraction of sites with a given number of pages, HP includes
data from two sources, infoseek.com and archive.org (see Figure 2A in
the article below).
http://www.hpl.hp.com/research/papers/weborder.pdf
Adamic LA, Huberman BA. The Web's Hidden Order. Hewlett-Packard Labs,
Palo Alto, CA 94304.
ladamic@hpl.hp.com
huberman@hpl.hp.com
To get totals, one must rebin the data that's presented in the graph
in the above paper. I did this by extracting the data using
GraphClick and rebinning using Excel. Here are the results for the
ranges you specify:
http://www.arizona-software.ch/applications/graphclick/en/
Based on 74,572,794 total sites, we get the following approximate values:
Web sites have 10 pages or less: 73,950,191
Web sites have 11-50 pages: 591,183
Web sites have 51-100 pages: 26,600
Web sites have over 100 pages: 4,790
http://www.hpl.hp.com/research/papers/weborder.pdf
=================================================
I hope this information is useful. Please feel free to request
clarification prior to rating.
-welte-ga |
Clarification of Answer by
welte-ga
on
21 Dec 2005 15:13 PST
Hi again Fred,
You are correct. The November, 2005 graph shows the number of sites
(or domain names) and the number with active (live) sites:
http://news.netcraft.com/archives/2005/11/index.html
I based the numbers I gave you on the total sites (domain names). If
you are interested in the proportions of active sites in the ranges
you specified, I can calculated those as well. Because the underlying
proportions (from the HP article) would be the same, one can redo the
same analysis for the number of active (live) sites:
http://news.netcraft.com/archives/2005/11/index.html
Based on this, there are about 34 million active sites as of November, 2005.
Based on 34 million active (live) sites, we get the following approximate values:
Web sites have 10 pages or less: 33,716,137
Web sites have 11-50 pages: 269,538
Web sites have 51-100 pages: 12,100
Web sites have over 100 pages: 2,180
-welte-ga
|
Clarification of Answer by
welte-ga
on
29 Dec 2005 18:01 PST
Hi again,
The answer to this part of your question depends a little on how you
define it. The "surface web," that part of the web that's easily
indexed by search engines, tends to be more static. The so-called
"deep web" consists of databases, dynamic web pages, etc., and is much
harder to index by search engines. There is considerable ongoing
research on this topic. Here is one useful source:
http://www.deepwebresearch.info/
"The Deep Web covers somewhere in the vicinity of 600 billion pages of
information located through the world wide web in various files and
formats that the current search engines on the Internet either cannot
find or have difficulty accessing. The current search engines find
about 8 billion pages at the present time of this writing. "
-welte-ga
|