I just made Google Guide, www.googleguide.com, (an interactive online
tutorial on searching with Google) publicly available
month ago. How can get Google to include all or at least more of the
pages in Google Guide in its index? I would like users to be able to
search the contents of my online tutorial using the search box that
appears at the bottom of each page in Google Guide. |
Request for Question Clarification by
serenata-ga
on
07 Dec 2003 22:26 PST
Hi Nancy ~
I just did an allinurl:www.googleguide.com search ...
Did you see ALL the sites that are indexed identically?
I'd be interested in your site stats and whether the Googlebot has
crawled them all, and if not, where it's stopping.
This text is coming up on about 20 of the pages which are indexed.
"... Creative Commons, Google, WWW GOOGLEGUIDE.COM. Google
~Guide is not affiliated with nor endorsed by Google (but
we are big fans!).
Maybe encountering this repeatedly is what's stopping the Googlebot? I
really am not certain, which is why I suggested checking your site
stats to see if you've been visited by the Googlebot on all the pages.
Serenata
|
Clarification of Question by
nancyb-ga
on
08 Dec 2003 22:51 PST
> I just did an allinurl:www.googleguide.com search ...
>
> Did you see ALL the sites that are indexed identically?
All the pages have different titles. (I changed one of the two that
were identical). The text in the footer of the page includes the URL
and is what appears when you search on allinurl:www.googleguide.com.
The rest of the pages are quite different from each other.
> I'd be interested in your site stats and whether the Googlebot has
> crawled them all, and if not, where it's stopping.
Googlebot appears only to have crawled Google Guide's home page since December.
Other pages haven't been crawled since the end of November and I've
updated all of the pages since then.
> Maybe encountering this repeatedly is what's stopping the Googlebot? I
> really am not certain, which is why I suggested checking your site
> stats to see if you've been visited by the Googlebot on all the pages.
How can I tell from the logs what is stopping Googlebot? Here's some
entries from the site's log:
crawler10.googlebot.com - - [08/Dec/2003:09:43:17 -0800] "GET
/robots.txt HTTP/1.0" 404 204 "-" "Googleb
ot/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [08/Dec/2003:09:43:18 -0800] "GET /
HTTP/1.0" 200 17576 "-" "Googlebot/2.1 (
+http://www.googlebot.com/bot.html)"
crawler14.googlebot.com - - [08/Dec/2003:12:43:31 -0800] "GET
/robots.txt HTTP/1.0" 404 204 "-" "Googleb
ot/2.1 (+http://www.googlebot.com/bot.html)"
crawler14.googlebot.com - - [08/Dec/2003:12:43:31 -0800] "GET /
HTTP/1.0" 304 - "-" "Googlebot/2.1 (+htt
p://www.googlebot.com/bot.html)"
> Also double check that you dont have a robots.txt in your webroot
> directory causing googlebot not to scan.
I don't have a robots.txt in the webroot directory.
Nancy
|
Request for Question Clarification by
serenata-ga
on
08 Dec 2003 23:45 PST
Ahah!
It's looking for a robots.txt file to figure out where to go, and
coming up with a 404 (page not found) error.
Time to write a simple robots.txt (and perhaps an easy to navigate
text menu) so that the googlebot can get around the rest of your site.
I take it you don't have incoming links to certain parts, and
therefore, it's just not finding them at this time.
Google's Technical Guidelines suggest:
"Make use of the robots.txt file on your web server.
This file tells crawlers which directories can or
cannot be crawled. Make sure it's current for your
site so that you don't accidentally block the
Googlebot crawler. Visit
http://www.robotstxt.org/wc/faq.html for a FAQ
answering questions regarding robots and how to
control them when they visit your site."
- ://www.google.com/webmasters/guidelines.html
See if that helps ... it should, anyway.
Serenata
|
Clarification of Question by
nancyb-ga
on
12 Dec 2003 22:15 PST
> Time to write a simple robots.txt (and perhaps an easy to navigate
> text menu) so that the googlebot can get around the rest of your site.
But I want Googlebot to crawl my whole site. I'm not interested in
excluding any pages.
> I take it you don't have incoming links to certain parts, and
> therefore, it's just not finding them at this time.
I have incoming links all over my guide. There are links from each
page to the next and the previous page. There are links to other
sections of the guide. There is a link to the table of conents that
includes links to nearly all the other pages in the guide.
|