Many webmasters wonder how to ensure their sites will be included in
Google's index of web sites. Although Google crawls more than 2
billion pages, it is inevitable some sites will be missed. When Google
does miss a site, it is frequently for one of the following reasons:
1. The site is not well connected through multiple links to others on
the web.
2. The site launched after Google's last crawl was completed.
3. The design of the site makes it difficult for Google to effectively
crawl its content.
Google's intent is to represent the content of the Internet fairly and
accurately. To help make that goal a reality, here is a guide to
building a "crawler-friendly" site. There are no guarantees a site
will be found by our crawler, but following these guidelines should
increase the probability that your site will show up in Google search
results.
Do...
Provide high-quality content on your page - especially your home page.
If you follow only one tip from this page, this should be it. Our
crawler indexes web pages by analyzing the content of the pages
themselves. Google will index your site better if your pages contain
useful information. Plus, your site has a better chance of becoming a
favorite among web surfers and being linked to by others if the
information it contains is relevant and useful.
Do submit your site to the appropriate category in a web directory.
Listing your site in the Open Directory Project or Yahoo! increases
the likelihood it will be seen by robot crawlers and web surfers.
Do pay attention to HTML conventions. Make sure that your <TITLE> and
<ALT> tags are accurate and descriptive. Also, check your <A HREF>
tags for errors since broken or improperly formatted links can prevent
Google from indexing your page.
Do make use of the robots.txt file on your web server. This file tells
crawlers which directories can or cannot be crawled. Make sure it is
current for your site so that you don't accidentally block our
crawler. Visit: http://www.robotstxt.org/wc/faq.html for an FAQ
answering questions regarding robots and how to control them once they
visit your site.
Do ensure that your site is accessible through HTML hyperlinks.
Generally, your site is crawlable if the pages are connected to each
other with ordinary HTML links. If certain areas are not linked, you
may be excluding older browsers, differently-abled users, and Google.
Google can crawl content from a database or other dynamically
generated content as long as it can be found by following links. If
you have many unlinked pages, you may want to create a jump page from
which the crawler can find all of your pages.
Do build your site with a logical link structure. A hierarchical link
structure is not only beneficial to you, but also to Google. More of
your site can be crawled if it is laid out with a clear architecture.
Don't...
Fill your page with lists of keywords, attempt to "cloak" pages, or
put up "crawler only" pages. If your site contains pages, links or
text that you do not intend visitors to see, Google considers them
deceptive and may ignore your site.
Do not feel obligated to purchase a search optimization service. Some
companies "guarantee" your site a place near the top of a results
page. While legitimate consulting firms can improve your site's flow
and content, others employ deceptive tactics to try and fool search
engines. Be careful - if your domain is affiliated with one of these
services, it could be permanently banned from our index.
Do not use images to display important names, content or links. Our
crawler does not recognize text contained in graphics. Use ALT tags if
the main content and key words on your page cannot be formatted in
regular HTML.
Do not provide multiple copies of a page under different URLs. Many
sites offer text-only or printer-friendly versions of pages that
contain the same content as the graphic-enriched version of the page.
While Google crawls these pages, duplicates are removed from our
index. In order to ensure that we have the desired version of your
page, place the other versions in separate directories and use the
robots.txt file to block our crawler.
For more information, go to:
://www.google.com/webmasters/index.html
Or contact Google at:
help@google.com |