Google Answers Logo
View Question
 
Q: Getting Google to spider all the pages on my site ( No Answer,   0 Comments )
Question  
Subject: Getting Google to spider all the pages on my site
Category: Computers > Algorithms
Asked by: ortsed-ga
List Price: $50.00
Posted: 19 Jul 2004 14:22 PDT
Expires: 10 Aug 2004 11:26 PDT
Question ID: 376331
Why won't Google spider the articles on this site:

http://forward.com

I thought at first it was that Google doesn't like query strings, so I
made static versions of the articles here:

http://forward.com/site-map.html

...but still no luck.

Request for Question Clarification by larre-ga on 19 Jul 2004 18:50 PDT
Hello, 

You know the Researchers are independent contractor/not Google
employees/insiders spiel, right? If not, please take a look at the TOS
linked at the bottom of the page.


Can you tell me a little about spider visits -- 

How often do the Google spiders visit? 
Do you use a robots.txt file?

Thanks, 

---larre

Clarification of Question by ortsed-ga on 20 Jul 2004 07:14 PDT
I do realize that Google researchers are independent, but hopefully an
independent source will be able to figure this out since Google has
been little to no help in this regard.

No robots.txt file is being used.

Spiders seem to come quite regularly - possibly every few days as you
can see it has already indexed the front page for this week's newest
issue.
Answer  
There is no answer at this time.

The following answer was rejected by the asker (they received a refund for the question).
Subject: Re: Getting Google to spider all the pages on my site
Answered By: palitoy-ga on 22 Jul 2004 02:48 PDT
Rated:1 out of 5 stars
 
Hello ortsed-ga

Thank-you for your question.  I have taken some time to study your
site and these are the suggestions that I have to make.  If you have
any questions or queries regarding this answer please ask for
clarification and I will do my best to help.

Upon loading the site map page I am presented with a huge list of
URL's.  On Google's Webmaster recommendation page (see below for a
link) it states that "if the site map is larger than 100 or so links,
you may want to break the site map into separate pages" and "keep the
links on a given page to a reasonable number (fewer than 100)".  Your
site map at present exceeds this limit (I think I counted 1546 links
on your site map!) and could possibly be ignored by Google.

I would suggest that the site map is rewritten so that it is smaller
and has more structure to it (that is break the links down into groups
of similar content).  It would also be favourable to add some other
text to the page - rather than just having a list of links to have
some content as well.

By this I mean give the page a title, tell the users of this page what
the page is and to have a small summary from each article with the
link.  Do not be afraid to have a number of site maps for each
separate section of your site.

Again the Google Webmaster Guidelines suggest "write pages that
clearly and accurately describe your content".

I have also noticed that the site map page is not a fully formed web
page - it does not include all the tags that you would normally
expect.  For instance <HTML>, <HEAD>, <TITLE>, <BODY> are all missing
and this should DEFINITELY be checked and improved upon.  Again the
guidelines suggest "make sure that your TITLE and ALT tags are
descriptive and accurate".

It could also be argued that your site map is breaking one of Google's
basic quality principles - "Make pages for users, not for search
engines".

Additionally the link to the site map on the home page appears to be
hidden.  The tag to display it is <a href="site-map.html"
style="visibility:hidden">_</a>

Another of Google's quality principles is "Avoid hidden text or hidden
links" so your site may be falling foul of this rule and Google may be
"responding negatively" to it.

"These quality guidelines cover the most common forms of deceptive or
manipulative behavior, but Google may respond negatively to other
misleading practices not listed here, (e.g. tricking users by
registering misspellings of well-known web sites). It's not safe to
assume that just because a specific deceptive technique isn't included
on this page, Google approves of it. Webmasters who spend their
energies upholding the spirit of the basic principles listed above
will provide a much better user experience and subsequently enjoy
better ranking than those who spend their time looking for loopholes
they can exploit."
://www.google.com/webmasters/guidelines.html

All of the Google webmaster help pages can be found here:
://www.google.com/webmasters/index.html

In your question you also mentioned your suspicions about your query
strings, I would highly recommend that you read through this answer by
fellow researcher larre-ga:
http://answers.google.com/answers/threadview?id=196839

There is also a wealth of information available on this by searching
through Google:
://www.google.com/search?q=spider+%22query+strings%22

If you are running on an Apache web server it is also a simple task to
change all of your links to static links (even on your home page). 
The technique you need to research here is called MOD_REWRITE and it
allows you to rewrite your URL's so that they all appear to be static.
 This is the type of technique used by sites such as Amazon.

As a conclusion, if you redesign the page to include fewer links, a
fully formed html page, more reader-friendly content and it abides by
Google's design guidelines I believe it will be more successful.  Once
again, if you have any questions on the issues I have flagged up here
please ask for clarification and I will do my best to help.

Request for Answer Clarification by ortsed-ga on 29 Jul 2004 22:03 PDT
Thank you for your response, but I don't think it has sufficiently
answered the question.

Your response has a lot to do with the site map, which does not
conform to Google's standards.  This is true, and although I'm fixing
that, I don't think that should be the main problem.

The articles are being linked from other locations besides the site
map.  This problem also existed before the site map existed.

Please consider a more refined answer as the site map is not the
source of the problem.

Clarification of Answer by palitoy-ga on 30 Jul 2004 01:30 PDT
Before I do any more work on this can you please then refine your
exact question and requirements?

Your original question indicated that this link -
http://forward.com/site-map.html - was causing the problem.  I believe
this page is the problem as it breaks so many of Google's spidering
rules and I have shown you why Google would not be linking from that
page and also indicated that you should change the link structure on
your site using MOD_REWRITE.

You now also have no link to the site map on the home page rather than
a hidden one as before...

Clarification of Answer by palitoy-ga on 30 Jul 2004 01:32 PDT
Can you please also explain why you think the site-map.html page is no
longer the problem as the live version of the page still breaks all
the rules I mentioned?

Request for Answer Clarification by ortsed-ga on 30 Jul 2004 09:40 PDT
As I said before, the site map and the fixed pages were something I
put in as an attempt to fix this problem.  The problem has existed
before the site map was even there.

The problem has something more to do with Google spidering dynamic
pages - but Google does spider all the other dynamic pages on the site
such as:

http://www.forward.com/main/section.php?section=arts

Just not the articles.  I have heard about the mod rewrite technique
before, but am not sure if I will have the type of server access
required to set that up.  Would prefer to find out what the problem is
here. (Possibly having to do with numbers in a query string?)

Clarification of Answer by palitoy-ga on 30 Jul 2004 10:01 PDT
How often are the article links updated on your homepage?  How often
does Google spider your site?  It may be more appropriate to submit
your site to the Google News section of Google - Google rarely
includes links to any articles on other newspapers (and my experience
has shown in the past that too many changes on a page are not good in
terms of getting the links spidered).

I would also still recommend having a static page (that is properly
designed) with links to the articles you wish to be spidered.  If the
spider finds the link a number of times it is more likely to keep it
in its database than if it finds it once (say on your homepage) and
then never indexes it again.

I would also recommend that you read this excellent answer on the
Google Answers site relating to a similar problem that you are having:
http://answers.google.com/answers/threadview?id=196839

What server is your site running on?  I would always recommend having
a unique URL for each article so that the problem of using query
strings is negated.

Clarification of Answer by palitoy-ga on 30 Jul 2004 10:16 PDT
I forgot to mention how to submit your site to Google News - send an
email to source-suggestions@google.com with details about your site. 
More on this can be found here:
http://news.google.com/intl/en_us/about_google_news.html
Reason this answer was rejected by ortsed-ga:
The question was not answered in any sort of way.  The respondant
focused on something that I know was not the answer and gave no other
hints to what the answer was.

He repeated the same requests for clarification as if he hadn't even
read my previous posts which had already answered that.

They did little research on the site I pointed them to, but seemed to
have tried to answer the question as quickly as possible with little
analysis.

It really seemed like they hadn't read the text of anything I wrote. 
They thought my whole question was about a site map, when that was it
was about the articles.
ortsed-ga rated this answer:1 out of 5 stars
Not helpful whatsoever.  Kept asking me for clarification on things I
already answered.  Completely ignored the main thrust of my question. 
Answered nothing for me.

I would actually like my money back for this.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy