Google Answers Logo
View Question
 
Q: Structuring a shtml file for Google spiders ( Answered,   0 Comments )
Question  
Subject: Structuring a shtml file for Google spiders
Category: Computers > Internet
Asked by: hammerhands-ga
List Price: $15.00
Posted: 09 Sep 2002 12:58 PDT
Expires: 09 Oct 2002 12:58 PDT
Question ID: 63158
PLEASE ONLY CHARGE ME ONCE FOR THIS. I SUBMITTED IT TWICE BECAUSE I
DON'T THINK IT WENT THRU THE FIRST TIME.

I work for a weekly oil and gas newspaper called Petroleum News
Alaska. My problem is... For example, if you search for Northstar
Gunkel on www.Google.com (which are 2 keywords in a particular news
article of ours) you find our comparch.shtml file (#5 in the search).
This comparch.shtml file was set up for the search engines to find all
our 2600+ news articles. My problem is that when I do a search for a
story (like the Northstar Gunkel search above) that Google is pointing
to that comparch.shtml file INSTEAD of the html link associated with
each story inside that comparch.shtml page. How do I make it so Google
points to the links inside that comparch.shtml INSTEAD of just
pointing to the generic all-containing comparch.shtml file? I thought
that maybe I'd take the comparch.shtml file and delete all the story
description text so that it'd ONLY contain a list of the 2600+ html
links. (Is that wise?) BUT I'm also thinking that there's some robot
meta text I can add to the comparch.shtml file to tell the Google
robots/spiders to be sure to link to the html files inside and not to
the comparch.shtml file.
Please advise on what you think I should do.
Thank you,
Dan Wilcox
Answer  
Subject: Re: Structuring a shtml file for Google spiders
Answered By: robertskelton-ga on 09 Sep 2002 14:38 PDT
 
Hi Dan,

There is a simple remedy using a META tag.

<meta name="robots" content="noindex,follow">

When the Googlebot sees this, it will follow the links in the page,
but will not index the content.

This will solve your problem, but may have an effect on your ranking
in Google search results. Fortunately, because your site is updated so
regularly, Googlebot visits often, so it won't take too long to see
what the effect will be. My guess is that your ranking will improve.

The page comparch.shtml doesn't appear to be made by hand, so I'd
leave the descriptions there, they can't do any harm once the META tag
is in place.

More information on robots.txt can be found at:
http://www.robotstxt.org/wc/robots.html


Search strategy:
Personal experience

I trust this answers your question. If any portion of my answer is
unclear, please ask for clarification.

Best wishes,
robertskelton-ga

Request for Answer Clarification by hammerhands-ga on 09 Sep 2002 15:12 PDT
Robert,

Thank you for your META tag instruction. I just want to make sure you
still agree we should leave the descriptions there in the
comparch.shtml page, because currently the file is 1.4 megs! If I
deleted the descriptions it'd be 149k. Do you see a problem with it
being such a large file? Do the spiders refuse to look at the entire
page since it's so huge? Would they spider the whole thing if it were
chopped down to 149k?

Thanks,
Dan

Clarification of Answer by robertskelton-ga on 09 Sep 2002 16:14 PDT
Yikes! No wonder it was too large for Notepad! 

A good thing / bad thing about Google is that it only indexes the
first 101K of a page, so in search results if a page is over 101K, it
still says 101K - it never crossed my mind that your page could be so
huge. My apologies.

Google's cache for the page stops at 101K, and so any article links
older than July 14, 2002 have not been indexed.
http://216.239.33.100/search?q=cache:qTkJAK3s6KgC:www.petroleumnewsalaska.com/comparch.shtml+northstar+gunkel&hl=en&ie=UTF-8

I tried searching for some June articles and Google couldn't find
them. I used to think that the Googlebot would follow all links,
regardless of page size, but using your site as evidence, this is
obviously not correct.

Keywords found in and around links are important to Google, but more
important for you would be to get as many links followed as possible.

149K is not less than 101K - the only remedy would be to have the
links covering multiple pages, perhaps one covering the last 2 months
and others as archives.

My revised suggestion is that you get rid of the descriptions.

Request for Answer Clarification by hammerhands-ga on 09 Sep 2002 17:35 PDT
Do you suggest I get rid of the descriptions and break the file into 2
files? (One file would be 100k and the other approx 49k.)
Dan

Clarification of Answer by robertskelton-ga on 09 Sep 2002 18:24 PDT
One file of 100k and the other of 49k would work fine, although 90K
and 59K would give you a bit of a buffer - avoiding accidentally going
over 101K. Make sure that the two pages link to each other.

Removing the descriptions is the only option, apart from splitting the
1.4MB file into 20 files of 70K. However, in my experience this is
deviating too far from the "site map" type page that Google seems to
like - that is, a single page that links to every other page. In your
case it needs to cover at least 2 pages, but that cannot be helped.
Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy