Thanks for asking!
Percentage of 404 errors
------------------------
In order to demonstrate the variance in percent of 404 Error Messages,
I've randomly selected 25 sets of published webstats and calculated
the percentage of 404 pages out of the total number of site hits. The
percentages vary from 0 to 95 percent, the average 6.96 percent, and
the median at 0.95 percent.
Total Hits # of 404's Percentage
============ ============ ============
65 1 1.54
14,104 321 2.28
98,394 4,874 4.95
8,495 297 3.50
9,592 25 0.26
175,008 84 0.05
1,202 112 9.32
2,342,749 0 0.00
15,840 1,334 8.42
138,298 97 0.07
2,638 267 10.12
3,444,510 33,674 0.98
37,117 2,684 7.23
4,984 1,993 39.99
259,659 6,168 2.38
16,948 1,284 7.58
8,693 1,193 13.72
1,579,411 213,889 13.54
302,046 0 0.00
307,661 2,714 0.88
259,155 30,741 11.86
98,543 1,681 1.71
1,277,430 26,492 2.07
4,044 1,264 31.26
205,122 749 0.37
--------- -------
10,611,708 331,938
Median: 102,594 375 0.95
Average: 424,468 13,278 6.96
Google Search for Web Stats
://www.google.com/search?q=%22webtrends+summary+report%22+404
More than that, webstats generally vary from month to month. Depending
on the strategy used to eliminate 404 errors, and how "active" the
website is, 404 percentages can change frequently. Active websites,
ones with frequently changed or added content generally experience a
higher number of Page Not Found errors. However, as you can see, even
very large, busy sites, -can- achieve perfection--zero percent 404's
out of 2,342,749 total site hits.
For overall answer flow and readability, I've placed the list of
sampled sites at the end of the answer.
404 Guidelines.... Or not
----------------------------------------------------------------------
There is no specific guideline for "how many 404 errors are
permissible?" because the answer to that question depends heavily upon
the source of the 404 errors. Let's examine the most common causes and
reasons for 404 errors.
Many 404 errors are generated by search engine robots, either seeking
your Robots.txt file, or trying to confirm pages listed somewhere,
some time by a directory, third-party site, or other search engines.
Other files commonly sought are:
/favicon.ico - Windows Favorites Icon
/w3c/p3p.xml - P3P Privacy policy
User agents (search engine robot data collectors) try to download
these files every visit. If the files are not present, the requests
will generate 404 errors. This type of 404 error may be safely
ignored, or you may place real or dummy (blank) files with those names
in your web directory to prevent the errors.
If you use both upper and lower case file names (i.e. MyPicture.Gif)
you may receive 404 errors from Unix/Linux customers and users. Unix
and operating system cousins are case sensitive. A Linux user typing
mypicture.gif into their browser will receive a 404 error.
If you purchase a domain name that has been previously owned by
someone else, 404 errors may be generated by links to old pages.
Preventing 404 Page Not Found Errors
----------------------------------------------------------------------
There are several steps that can reduce possible 404's. Redirects
using the .htaccess file can automatically take users from an older
page to its newer replacement. Permanent and Temporary Redirects can
"catch" old referrals from other sites and search engines and send
those visitors in the proper direction, rather than redirecting to
your catchall 404 page. The more specifically you can guide visitors
to their intended information, the happier they'll be.
Javascript Kit offers detailed tutorials that demonstrate how to
create an .htaccess document and how to use it for redirection.
Comprehensive Guide to .htaccess - Redirects
Comprehensive Guide to .htaccess - Error Documents
http://wsabstract.com/howto/htaccess2.shtml
Tracking Down 404 Errors
----------------------------------------------------------------------
In order to eliminate 404's they need to be found. Web logs often
provide a basic listing, however, with any site of 100 pages or more
the process can become too complicated for analysing raw logs by hand.
There are a variety of tools that can help identify the source of 404
errors. Choosing among them is a matter of your own preferences.
Scripts, software, and online checkers are available.
Dr. Watson will verify links page by page (Free)
http://watson.addy.com/
LinkCheck v4 - Online Link Checker
http://www.poisontooth.com/linkcheck/
MomSpider is a multi-site spider that simplifies site maintenance
http://ftp.ics.uci.edu/pub/websoft/MOMspider/WWW94/paper.html
CyberSpyder Link Tester is a stand-alone software Links Checker
http://www.cyberspyder.com/cslnkts1.html
Tracking 404 errors on Active Server Pages
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=6292&lngWId=4
Advanced 404 is a script to notify of 404 errors
http://123webmaster.com/Onsite/Management/Advanced404.html
A further (extensive) list of Link Validators is provided by:
Business.com - Link Validation
http://www.business.com/directory/internet_and_online/site_management/link_monitors/
The Automatic Solution
----------------------------------------------------------------------
The sites that manage to completely eliminate 404 errors on a regular
basis probably have automated assistance.
404 Manager consists of scripts that provide identification and
elimination of Page Not Found errors. Various scripts handle
Administration of the 404 Management System, Automatic Configuration
of the server (*nix only), Automatic Redirection, Specific 404
Redirects, and script tools that provide notifications, and standard
404 redirects.
Most content management systems, as well as professional level web
design tools include link checking, automatic link updating
capability, and sophisticated bad link reporting.
Further 404 Resources
----------------------------------------------------------------------
Causes of 404 Errors
http://www.webhostingfacts.com/causes_404.htm
Friendly 404 Errors
http://www.webreference.com/new/011004.html
The HTTP Error 404 Antidote - Web Monkey
http://hotwired.lycos.com/webmonkey/02/40/index4a.html?tw=backend
The Prefect 404 - A List Apart
http://www.alistapart.com/articles/perfect404/
404 Research Lab - Everything 404
http://www.plinko.net/404/
Favicon.ico 404 Page Not Found Errors
http://www.traffic-test-tube.com/search-engine-articles/favicon.shtml
Search Strategy
----------------------------------------------------------------------
Personal Knowledge and bookmarks, plus Google Search Terms:
404 scripts
404 .htaccess redirects
404 error tracking
page not found
404 percent
I hope this information is useful to you. If you have questions about
the material provided, please feel free to ask.
---larre
Site Stats Sampled
----------------------------------------------------------------------
http://stats.state.nv.us/2001/thechoiceisyours.org/may/DEFAULT_01_b.HTM
http://www.bio.cornell.edu/stats/01/08/default_01_b.htm
http://www.afr-sd.org/webanalysis/encap/mar02/DEFAULT_01_b.HTM
http://stats.state.nv.us/2002/gov/march/default_01_b.htm
http://www.afr-sd.org/webanalysis/encap/jun02/DEFAULT_01_b.HTM
http://stats.state.nv.us/2001/operationgamethief.org/january/DEFAULT_01_b.HTM
http://stats.state.nv.us/2002/barber/march/default_01_b.htm
http://www.virginia.com/wreport/default_01_b.htm
http://www.afr-sd.org/webanalysis/encap/apr03/DEFAULT_01_b.HTM
http://www.bio.cornell.edu/stats/01/07/default_01_b.htm
http://stats.state.nv.us/2001/hr/may/DEFAULT_01_b.HTM
http://www.nativeweb.org/statistics/oct-dec00_01_b.htm
http://www.afr-sd.org/webanalysis/afrsd/jan03/DEFAULT_01_b.HTM
http://www.ohioswim.org/report/2004/w05/w05_78_b.html
http://www.cast.uark.edu/stats/complete_summary_01_b.htm
http://www.encapafrica.org/stats/sep03/DEFAULT_01_b.HTM
http://www.nrmtracker.org/stats/aug03/DEFAULT_01_b.HTM
http://www.ccuec.unicamp.br/treinamentos/webpro/implantar/relatorio_estatistico/unicamp_01_b.htm
http://www.newpaltz.edu/stats/2000/july/index_01_b.htm
http://www.bitola.com/posetenost/do220303/rep_01_b.htm
http://www.maydaymystery.org/logs/mday0603/index_01_b.html
http://193.70.162.66/cultura/biblioteche/statistiche/agosto_10novembre2002/index_01_b.htm
http://www.royal-hosting.com/logs/report_01_b.htm
http://www.tad.org/stats/june/index_01_b.htm |
Clarification of Answer by
larre-ga
on
20 Apr 2004 22:17 PDT
My apologies for the table formatting errors. A few hidden characteris
must have snuck into my document. Here's the corrected version:
Total Hits # of 404's Percentage
============ ============ ============
65 1 1.54
14,104 321 2.28
98,394 4,874 4.95
8,495 297 3.50
9,592 25 0.26
175,008 84 0.05
1,202 112 9.32
2,342,749 0 0.00
15,840 1,334 8.42
138,298 97 0.07
2,638 267 10.12
3,444,510 33,674 0.98
37,117 2,684 7.23
4,984 1,993 39.99
259,659 6,168 2.38
16,948 1,284 7.58
8,693 1,193 13.72
1,579,411 213,889 13.54
302,046 0 0.00
307,661 2,714 0.88
259,155 30,741 11.86
98,543 1,681 1.71
1,277,430 26,492 2.07
4,044 1,264 31.26
205,122 749 0.37
--------- -------
10,611,708 331,938
Median: 102,594 375 0.95
Average: 424,468 13,278 6.96
---
|