Google Answers Logo
View Question
 
Q: Getting indexed better by Google ( Answered,   5 Comments )
Question  
Subject: Getting indexed better by Google
Category: Computers > Internet
Asked by: andrew_gray-ga
List Price: $5.00
Posted: 17 Mar 2003 05:56 PST
Expires: 16 Apr 2003 06:56 PDT
Question ID: 177287
I work for a publisher who owns a news and information service
dedicated to accountants called "AccountingWEB" - the worlds largest
with about 200,000 members.

It's a database driven site and most of the daily news (about 10
stories a day) is delivered via CGIs with only a synopsis appearing on
the main home page. Google indexes the site but we've never been able
to get any of the news stories into the Google index (and our news
stories are particularly valuable - at least to acocuntants).

Can you give me guidance as to how to get our CGI delivered news
content indexed by Google without having to prepare a "static" HTML
page for each one?

Thanks.
ANDREW

www.accountingweb.co.uk
www.accountingweb.com
www.accountingweb.nl
Answer  
Subject: Re: Getting indexed better by Google
Answered By: serenata-ga on 23 Mar 2003 19:01 PST
 
Hi Andrew ... 

I am very familiar with Accounting Web's website. It has important
information which is often quoted or mentioned on sites I subscribe
to, such as TaxMama.com and other accounting information sites.

Dynamic pages can indeed be a problem for indexing purposes ...

Google says in its guidelines to try to avoid them, saying in its
Design and Content Guidelines, "If you decide to use dynamic pages ...
be aware that not every search engine spider crawls dynamic pages as
well as static pages. It helps to keep the parameters short and the
number of them small."
 - ://www.google.com/webmasters/guidelines.html

I feel the information you have on your site is important, too, so
it's a shame they are not getting indexed. I did a search of Danny
Sullivan's Search Engine Watch, and he has a suggestion which you may
be able to use ...

This is from Danny Sullivan's Search Enging Placement Tips, updated
October, 2002:

"Generating pages via CGI or database-delivery? Expect that some of
the search engines won't be able to index them. Consider creating
static pages whenever possible, perhaps using the database to update
the pages, not to generate them on the fly."
  - http://www.searchenginewatch.com/webmasters/tips.html

A search of Google for creating static pages with a dynamic content
feed found a discussion on Webmaster World. It discusses actually
using enough static text to feed the rest of the dynamic content
regularly. You can see the discussion here:
 - http://www.webmasterworld.com/forum3/6542.htm

The point is, a search engine needs a static page to index, so if you
can make your pages static with enough content to get indexed, you can
feed the dynamic content on the page, instead of making the entire
page dynamic.

It's not tricking the search engine, but it is giving the search
engine something to actually find. I am not familiar enough with your
site's database or how it is dynamically built to help you achieve
this, but I bet your designer can help you figure out how to do it.

Search terms used -
 - search engines +dynamically generated pages
 - searching dynamically generated pages
 - search engines + cgi

I hope this helps and that you can get these pages indexed soon - even
if it is dynamically generated.

Yours ever so,
 Serenata
Comments  
Subject: Re: Getting indexed better by Google
From: robertskelton-ga on 17 Mar 2003 14:02 PST
 
Hi there,

Google sometimes indexes such content, but they don't say what
criteria they use to determine indexing or not indexing. I suggest
just emailing them and let them know:

help@google.com

And while you are at it, suggest your site for inclusion in Google
News:

news-feedback@google.com
Subject: Re: Getting indexed better by Google
From: shobjanta-ga on 18 Mar 2003 14:56 PST
 
Usually, the search engines will not index URLs that use cgi-bin style
parameters, i.e.
http://www.somedomain.com/cgi-bin/script?param1=value1&param2=value2

If your site is implemented in CGI, and you still want your dynamic
pages to get indexed, you can configure your web-browser/cgi-bin
subsystem to use the path scheme instead. So the corresponding URL may
look like:
http://www.somedomain.com/cgi-bin/script/value1/value2

In which case Google and other search engines won't "know" this is a
cgi-bin generated page and will happily index it.

Of course, like any other pages, if you want these dynamic pages to be
indexed by Google (and others), they would need to be linked in from
other pages.

The way you set this up depends on the web-server you are using. For
instance, if you are using Apache, you can use the "RewriteRule"
directive to do this.

Related Sites:
http://www.sitepoint.com/article/485
http://www.phpbb.com/phpBB/viewtopic.php?t=76843
Subject: Re: Getting indexed better by Google
From: andrew_gray-ga on 19 Mar 2003 02:24 PST
 
Thanks for this guidance.  Am I right in thinking that if we attempt
to solve the indexing problem in this way that we may introduce a
different problem - more firewalls will start caching more pages.  The
news pages include dynamic elements (like user comments) so it's
important that users get a fresh copy each time (not an old version
from a cache along the way).

As I understand it the "?" in the cgi string alerts most cache not to
cache.
Thanks.
ANDREW
Subject: Re: Getting indexed better by Google
From: shobjanta-ga on 21 Mar 2003 11:08 PST
 
"Am I right in thinking that if we attempt
to solve the indexing problem in this way that we may introduce a
different problem - more firewalls will start caching more pages"

I am not sure I follow what you are saying here. Yes the search
engines do cache these pages and they generally will index pages at
some regular interval. So in your page, if some one has added the word
"foobar", the search engine indexer will take a while to pick this
word up.

Is this what you mean?
If so, there really is no way out of here other than to provide your
site-users with your very own search engine, where you can control how
frequently your content is indexed.

Or are you referring to the fact that your page is wide open to the
world, there by you are allowing all search engines to "see" your
pages. Your initial question seems to suggest this is what you wanted,
so I dont understand why this is a problem. Anyways, if you did want
to prevent search engines from indexing your pages, you can place a
file called "robots.txt" at the root of your webserver. For help,
visit: http://www.robotstxt.org/wc/robots.html
You can place directives in this file, asking web search indexing
engines (called "spiders" or "robots") to either not visit certain
sections of your sites, or do something specific, etc.
Subject: Re: Getting indexed better by Google
From: googleexpert-ga on 23 Mar 2003 17:37 PST
 
You might want to try webmasterworld.com

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy