Google Answers Logo
View Question
 
Q: How to get Google to include all pages in its index ( No Answer,   6 Comments )
Question  
Subject: How to get Google to include all pages in its index
Category: Computers > Internet
Asked by: nancyb-ga
List Price: $2.00
Posted: 07 Dec 2003 00:07 PST
Expires: 06 Jan 2004 00:07 PST
Question ID: 284338
I just made Google Guide, www.googleguide.com, (an interactive online
tutorial on searching with Google) publicly available
month ago.  How can get Google to include all or at least more of the
pages in Google Guide in its index?  I would like users to be able to
search the contents of my online tutorial using the search box that
appears at the bottom of each page in Google Guide.

Request for Question Clarification by serenata-ga on 07 Dec 2003 22:26 PST
Hi Nancy ~

I just did an allinurl:www.googleguide.com search ...

Did you see ALL the sites that are indexed identically?

I'd be interested in your site stats and whether the Googlebot has
crawled them all, and if not, where it's stopping.

This text is coming up on about 20 of the pages which are indexed.

     "... Creative Commons, Google, WWW GOOGLEGUIDE.COM. Google 
     ~Guide is not affiliated with nor endorsed by Google (but
     we are big fans!).
 
Maybe encountering this repeatedly is what's stopping the Googlebot? I
really am not certain, which is why I suggested checking your site
stats to see if you've been visited by the Googlebot on all the pages.

Serenata

Clarification of Question by nancyb-ga on 08 Dec 2003 22:51 PST
> I just did an allinurl:www.googleguide.com search ...
> 
> Did you see ALL the sites that are indexed identically?

All the pages have different titles.  (I changed one of the two that
were identical).  The text in the footer of the page includes the URL
and is what appears when you search on allinurl:www.googleguide.com. 
The rest of the pages are quite different from each other.

> I'd be interested in your site stats and whether the Googlebot has
> crawled them all, and if not, where it's stopping.

Googlebot appears only to have crawled Google Guide's home page since December. 
Other pages haven't been crawled since the end of November and I've
updated all of the pages since then.

> Maybe encountering this repeatedly is what's stopping the Googlebot? I
> really am not certain, which is why I suggested checking your site
> stats to see if you've been visited by the Googlebot on all the pages.

How can I tell from the logs what is stopping Googlebot?  Here's some
entries from the site's log:

crawler10.googlebot.com - - [08/Dec/2003:09:43:17 -0800] "GET
/robots.txt HTTP/1.0" 404 204 "-" "Googleb
ot/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [08/Dec/2003:09:43:18 -0800] "GET /
HTTP/1.0" 200 17576 "-" "Googlebot/2.1 (
+http://www.googlebot.com/bot.html)"
crawler14.googlebot.com - - [08/Dec/2003:12:43:31 -0800] "GET
/robots.txt HTTP/1.0" 404 204 "-" "Googleb
ot/2.1 (+http://www.googlebot.com/bot.html)"
crawler14.googlebot.com - - [08/Dec/2003:12:43:31 -0800] "GET /
HTTP/1.0" 304 - "-" "Googlebot/2.1 (+htt
p://www.googlebot.com/bot.html)"


> Also double check that you dont have a robots.txt in your webroot
> directory causing googlebot not to scan.

I don't have a robots.txt in the webroot directory.

Nancy

Request for Question Clarification by serenata-ga on 08 Dec 2003 23:45 PST
Ahah!

It's looking for a robots.txt file to figure out where to go, and
coming up with a 404 (page not found) error.

Time to write a simple robots.txt (and perhaps an easy to navigate
text menu) so that the googlebot can get around the rest of your site.

I take it you don't have incoming links to certain parts, and
therefore, it's just not finding them at this time.

Google's Technical Guidelines suggest:

     "Make use of the robots.txt file on your web server.
      This file tells crawlers which directories can or
      cannot be crawled. Make sure it's current for your
      site so that you don't accidentally block the 
      Googlebot crawler. Visit 
      http://www.robotstxt.org/wc/faq.html for a FAQ 
      answering questions regarding robots and how to 
      control them when they visit your site."
   - ://www.google.com/webmasters/guidelines.html

See if that helps ... it should, anyway.

Serenata

Clarification of Question by nancyb-ga on 12 Dec 2003 22:15 PST
> Time to write a simple robots.txt (and perhaps an easy to navigate
> text menu) so that the googlebot can get around the rest of your site.

But I want Googlebot to crawl my whole site.  I'm not interested in
excluding any pages.

> I take it you don't have incoming links to certain parts, and
> therefore, it's just not finding them at this time.

I have incoming links all over my guide.  There are links from each
page to the next and the previous page.  There are links to other
sections of the guide.  There is a link to the table of conents that
includes links to nearly all the other pages in the guide.
Answer  
There is no answer at this time.

Comments  
Subject: Re: How to get Google to include all pages in its index
From: saryon-ga on 08 Dec 2003 17:14 PST
 
Also double check that you dont have a robots.txt in your webroot
directory causing googlebot not to scan.

matt
Subject: Re: How to get Google to include all pages in its index
From: robertskelton-ga on 09 Dec 2003 00:06 PST
 
It's a mystery to me. 

You have done nothing wrong and you certainly don't *need* a robots.txt.

Google has indexed your Table of Contents:
http://216.239.33.104/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=cache:http%3A%2F%2Fwww.googleguide.com%2Ftoc.html

... so therefore had found links to all the other pages.

The only things I can think of are:

- Google was too busy this time around to index all of every site in
its index, so it only indexed some of your site. They mention this
possibility in the guidelines, but you'd expect that would mainly
apply to Geocities sites etc.

- you have changed the URL of your contents page to:
http://www.googleguide.com/page_2.html . Depending on how Google
identifies and makes use of "site maps", perhaps the changed link
structure confused the GoogleBot.

Possible solution:

GoogleBot is still visiting, so if you change the home page slightly,
it might be tempted to look at the other pages and see if they have
changed. To get it to visit again soon, try and get a link from a
popular blog. I've tweaked the link from my blog to see if it helps.

Rob.
Subject: Re: How to get Google to include all pages in its index
From: nancyb-ga on 09 Dec 2003 23:21 PST
 
> Possible solution:
> 
> GoogleBot is still visiting, so if you change the home page slightly,
> it might be tempted to look at the other pages and see if they have
> changed. To get it to visit again soon, try and get a link from a
> popular blog. I've tweaked the link from my blog to see if it helps.

I have changed many pages in Google Guide, but Google appears to have
only reindexed the home page.  I'll add some more links to the home
page and see if Google indexes any more pages.

Nancy
Subject: Re: How to get Google to include all pages in its index
From: nancyb-ga on 21 Dec 2003 09:18 PST
 
Google has now indexed all of Google Guides pages!

Nancy
Subject: Re: How to get Google to include all pages in its index
From: bjfb24-ga on 16 Feb 2005 08:10 PST
 
Hi, I posted my new website <a
href="http://www.googleadvsior.org">GoogleAdvisor.org</a>
approximately two weeks ago. I've noticed that the main webpage
(index.html) has been indexed by Google (and I can see in my logs that
several googlebots have visited this page), yet none of my subpages
have. Each subpage is linked to the main page by at least one link. I
really would have thought that, after two weeks time, the googlebots
would have already indexed all 10 pages or so of my website. I have
also noticed that when I use the "link:" feature to check to see how
many sites are linked to my site I get nothing. This is odd since I
know of a few sites that are already linked to GoogleAdvisor.org. I
was just wondering if I just need to be patient and give the system
more time, or if I need to change something on my main page to make it
more robot-friendly? Thanks a lot,

Brad B.
Subject: Re: How to get Google to include all pages in its index
From: searchingandcurious-ga on 16 Feb 2005 11:03 PST
 
Six weeks after I made Google Guide publicly available, Google
included all Google Guide pages in its index.  Before that time about
a half a dozen pages were included in Google's index.  I suspect that
this was because it took over a month  before Google's deep crawler
got around to indexing the site. For a description of how Googlebot
(Google's web crawler) works, see
http://www.googleguide.com/google_works.html#googlebot

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy