Google Answers Logo
View Question
 
Q: How to get Google to Spider Dynamically Gererated ASP pages ( Answered,   1 Comment )
Question  
Subject: How to get Google to Spider Dynamically Gererated ASP pages
Category: Computers > Internet
Asked by: smallblueplanet-ga
List Price: $200.00
Posted: 28 Apr 2003 19:29 PDT
Expires: 28 May 2003 19:29 PDT
Question ID: 196839
Our company website is www.smallblueplanet.com, it uses dynamic ASP
pages to list product for consumers. Currently these dynamic pages are
not listed in Google. Can you please tell us how we can get Google to
spider these dynamically gererated pages? Please include your email
address in the responce, incase we have technical follow up questions.
Thank You,
JoAnn Allen
Vice President
Small Blue Planet Toys

Request for Question Clarification by aceresearcher-ga on 29 Apr 2003 12:57 PDT
Greetings, JoAnn!

The first thing about which I need to ask you is the level of
programming knowledge and experience of the person who will be
implementing this on your site; this will make a huge difference in
the approach to be taken for your Answer.

Thanks,

aceresearcher

Clarification of Question by smallblueplanet-ga on 30 Apr 2003 06:43 PDT
We have a team of programming professionals who produce our site for
us. We would like if you could explain the modifications that would
need to be made and we will determine if the modifications can be
implemented in house, or out sourced to the programming team.
Thank You,
JoAnn
Answer  
Subject: Re: How to get Google to Spider Dynamically Gererated ASP pages
Answered By: larre-ga on 30 Apr 2003 12:23 PDT
 
Thanks for asking, JoAnn,

The answer is both simple and complex. Let me lay out a few of the
technical details behind the problem, then outline several possible
solutions.

You've invited search engine spiders to visit your dynamically
generated parlor by submitting your site URL, but did the Googlebot
actually accept the invitation? Perhaps. Perhaps not. Most search
engine spiders have some built-in rules about entering dynamically
generated "rooms" (pages).

Often dynamic sites require a search in order to locate information or
items in the database for display within a template. Spiders usually
can't search. Some spiders are even deliberately programmed to stay
away from dynamic pages.

Human visitors to a dynamic website find information using a search
query. That query can be typed into a searchbox, or pre-defined
queries coded into links on the homepage. Such links are called 'query
strings.'

Search engine spiders don't know to use the built-in search function,
or even what questions to ask. Your dynamic scripts require certain
information before they can generate the requested content: session
id, cookie data, or a query string are the most frequent requirements.
Spiders will stop and back out at this point in indexing, because they
can't answer the questions posed by the script.

If a spider does accidentally venture deeper into the site, it could
inadvertently become entangled in a "spider trap", a CGI script that
traps the spider and server into an endless loop of query and
counter-query. This sort of trap is not just bad for the spider.
Repeated page requests can also crash the server.


Let's examine Small Blue Planet in particular. Your website page's
URLs are causing the difficulty, rather than the .asp file type. Your
dynamic pages contain query strings. For example:

http://www.smallblueplanet.com/content/container.asp?target=storeresults&storeId=10834

Look carefully at the URL and spot the question mark after
container.asp. Most search engine spiders get to the "?" in the query
string and stop cold rather than continue into a possible spider trap.

How does a spider see your site? You can use a text-based browser,
such as Lynx, to see your site like search engine spiders do. If Lynx
can't reach a page, it's likely that a spider can't either.



SOLUTION OPTIONS
**********************************************************************

Submit Pages Individually
*************************

Submit pages, including query strings, individually. This sort of
negates the advantages of a dynamic site.


Add Dynamic Links To Static Pages
*********************************

Include static pages in your site with links to dynamic content. The
simplest way to do this is a master table of contents page that links
your most important dynamic pages. Category contents pages may also
help. These provide spiders with a way to index content without having
to answer questions.

The table of contents doesn't completely solve the problem with
spiders that halt at most query strings, including the Googlebot. You
can increase your chances of indexing individual products or product
categories by including good descriptions and descriptive links on a
static table of contents page. Google will still index the content of
the product listings page - including your link titles. Other search
engines that can follow dynamic links can visit the actual dynamic
page content without making a query.


Remove Query Strings From Dynamic URL's
***************************************

Amazon.com uses dynamically generated pages, but their method of URL
naming avoids the difficulty of query strings. The direct URL for
Harry Potter and the Order of the Phoenix is:

http://www.amazon.com/exec/obidos/tg/detail/-/043935806X/002-5910991-8455216

This method will work, however, it's the most technically demanding of
your options. The exact steps are contingent on the type of web server
your employ, and the software you're using to deploy your database.

For Active Server Pages -- Most search engines will index .asp pages
if the "?" is removed from the URL. This may be accomplished manually,
or the dynamic script changed to accommodate the a "/" rather than a
"?" A third-party product, XQASP will automatically remove the query
strings from your .asp pages and replace them with "/" marks.

One cautionary note. Replacing "?" with "/" will cause your dynamic
pages to appear to have their own subdirectories, so browsers will
attempt to look for images or links there. You can avoid broken links
by using all absolute links (high maintenance) or by using URL
addresses that are relative to the root directory of your site, rather
than to the documents themselves (i.e. /homepagename.asp rather than
../homepagename.asp).

Exceptional Digital Enterprise Solutions - XQASP 
http://xde.net/xq/tool.xqasp-deep-web/qx/index.htm


**********************************************************************

ANSWER STRATEGY: Personal knowledge/experience programming dynamic
websites.

I hope this information provides the background and problem solvers
you're seeking. If anything I've said is unclear, or should you have
questions about the material or link(s) provided, please, feel free to
ask.

larre-ga
Comments  
Subject: Re: How to get Google to Spider Dynamically Gererated ASP pages
From: spot_tippybuttons-ga on 30 Apr 2003 14:57 PDT
 
Actually, there is a far easier answer than re-working your entire
site. As larre-ga notes, the problem you describe is not unique to
your website; in fact, it's a problem shared by many large,
database-driven websites.

Since it is not an uncommon problem, there are actually a few
commercially available products designed specifically to solve the
problem you describe.

I can personally highly recommend one such product, known as
LinkDriver. You can read more about LinkDriver at
http://www.linkdriver.com/

LinkDriver automatically does pretty much all of the things larre-ga
suggests. LinkDriver dynamically rewrites urls to remove query strings
without you needing to change your pages. It also automatically
populates search form elements with logical groupings, and then
performs the search to build sensible "static" pages that point to
your dynamic content. It also does compliance checking on pages to
make sure the html is more search-engine friendly, such as including
proper language type and cache information tags. All of the major
setup is through a wizard-style interface, so that you don't need to
be a programmer to configure it.

If you did want to program something yourself, you could consider
writing an ISAPI dll to remap the urls, or feeding your site through a
Linux reverse-proxy running Apache and using mod_rewrite to do the
mapping. It's a lot more work than using LinkDriver, but it may still
be a lot less work than re-doing your entire site.

Hope you find this useful!

-Spot

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy