Thanks for asking, JoAnn,
The answer is both simple and complex. Let me lay out a few of the
technical details behind the problem, then outline several possible
solutions.
You've invited search engine spiders to visit your dynamically
generated parlor by submitting your site URL, but did the Googlebot
actually accept the invitation? Perhaps. Perhaps not. Most search
engine spiders have some built-in rules about entering dynamically
generated "rooms" (pages).
Often dynamic sites require a search in order to locate information or
items in the database for display within a template. Spiders usually
can't search. Some spiders are even deliberately programmed to stay
away from dynamic pages.
Human visitors to a dynamic website find information using a search
query. That query can be typed into a searchbox, or pre-defined
queries coded into links on the homepage. Such links are called 'query
strings.'
Search engine spiders don't know to use the built-in search function,
or even what questions to ask. Your dynamic scripts require certain
information before they can generate the requested content: session
id, cookie data, or a query string are the most frequent requirements.
Spiders will stop and back out at this point in indexing, because they
can't answer the questions posed by the script.
If a spider does accidentally venture deeper into the site, it could
inadvertently become entangled in a "spider trap", a CGI script that
traps the spider and server into an endless loop of query and
counter-query. This sort of trap is not just bad for the spider.
Repeated page requests can also crash the server.
Let's examine Small Blue Planet in particular. Your website page's
URLs are causing the difficulty, rather than the .asp file type. Your
dynamic pages contain query strings. For example:
http://www.smallblueplanet.com/content/container.asp?target=storeresults&storeId=10834
Look carefully at the URL and spot the question mark after
container.asp. Most search engine spiders get to the "?" in the query
string and stop cold rather than continue into a possible spider trap.
How does a spider see your site? You can use a text-based browser,
such as Lynx, to see your site like search engine spiders do. If Lynx
can't reach a page, it's likely that a spider can't either.
SOLUTION OPTIONS
**********************************************************************
Submit Pages Individually
*************************
Submit pages, including query strings, individually. This sort of
negates the advantages of a dynamic site.
Add Dynamic Links To Static Pages
*********************************
Include static pages in your site with links to dynamic content. The
simplest way to do this is a master table of contents page that links
your most important dynamic pages. Category contents pages may also
help. These provide spiders with a way to index content without having
to answer questions.
The table of contents doesn't completely solve the problem with
spiders that halt at most query strings, including the Googlebot. You
can increase your chances of indexing individual products or product
categories by including good descriptions and descriptive links on a
static table of contents page. Google will still index the content of
the product listings page - including your link titles. Other search
engines that can follow dynamic links can visit the actual dynamic
page content without making a query.
Remove Query Strings From Dynamic URL's
***************************************
Amazon.com uses dynamically generated pages, but their method of URL
naming avoids the difficulty of query strings. The direct URL for
Harry Potter and the Order of the Phoenix is:
http://www.amazon.com/exec/obidos/tg/detail/-/043935806X/002-5910991-8455216
This method will work, however, it's the most technically demanding of
your options. The exact steps are contingent on the type of web server
your employ, and the software you're using to deploy your database.
For Active Server Pages -- Most search engines will index .asp pages
if the "?" is removed from the URL. This may be accomplished manually,
or the dynamic script changed to accommodate the a "/" rather than a
"?" A third-party product, XQASP will automatically remove the query
strings from your .asp pages and replace them with "/" marks.
One cautionary note. Replacing "?" with "/" will cause your dynamic
pages to appear to have their own subdirectories, so browsers will
attempt to look for images or links there. You can avoid broken links
by using all absolute links (high maintenance) or by using URL
addresses that are relative to the root directory of your site, rather
than to the documents themselves (i.e. /homepagename.asp rather than
../homepagename.asp).
Exceptional Digital Enterprise Solutions - XQASP
http://xde.net/xq/tool.xqasp-deep-web/qx/index.htm
**********************************************************************
ANSWER STRATEGY: Personal knowledge/experience programming dynamic
websites.
I hope this information provides the background and problem solvers
you're seeking. If anything I've said is unclear, or should you have
questions about the material or link(s) provided, please, feel free to
ask.
larre-ga |