Google Answers Logo
View Question
 
Q: Mysterious Google non-indexing ( Answered 4 out of 5 stars,   1 Comment )
Question  
Subject: Mysterious Google non-indexing
Category: Computers > Internet
Asked by: cec2003-ga
List Price: $20.00
Posted: 08 Apr 2003 00:04 PDT
Expires: 08 May 2003 00:04 PDT
Question ID: 187551
We have a problem with google indexing (not!) the 2003 Congress on 
Evolutionary Computation web pages (www.cs.adfa.edu.au/cec_2003). 
That is, they don't appear in google at all. This is despite the pages 
having been in existence since May 2002, and being fairly heavily linked 
(www.alltheweb.com returns these pages as the second highest-ranked 
search for 'cec 2003', and finds 68 external links including from two 
top-level domains (www.wcci2002.org and www.cec2003.org). 
Can anyone explain
.why we aren't getting indexed
.most important: what we can do to fix it

Possibly relevant information:
Our whole school web site (www.cs.adfa.edu.au) is not getting 
indexed very nicely: it is actually indexed under 
csadfa.cs.adfa.edu.au instead. Some years ago, this was the 
primary address of our site, but this was fixed quite a while ago,
though until recently a query to csadfa.cs.adfa.edu.au would 
return that as the address (now it will be returned as a redirect 
to www.cs.adfa.edu.au). 

Anyway, the current status is
if you query google with www.cs.adfa.edu.au, it returns 
'no information available';
if you query csadfa.cs.adfa.edu.au, it finds the page, but then 
a query to link:csadfa.cs.adfa.edu.au returns zero links 
(alltheweb finds around 800 external links to 
www.cs.adfa.edu.au, so it _ought_ to have a reasonable PR).

We _know_ the pages are being explored, because we can see
 the googlebot probes. We _think_ that the problems may have 
arisen because
.csadfa.cs.adfa.edu.au being the original site, www.cs.adfa.edu.au, 
      having identical information, was treated as a duplicate site
.maybe the redirect from csadfa.cs.adfa.edu.au to www.cs.adfa.edu.au is too
      recent to have influenced google's indexing yet
.perhaps we will have to specifically request removal of csadfa.cs.adfa.edu.au
      from google to get things to eventually sort themselves out?
.we're guessing that the zero links to csadfa.cs.adfa.edu.au are because all
      the 800 external links reference www.cs.adfa.edu.au; since this isn't
      even indexed, the link counts are lost

What constitutes a reasonable answer:
.a clear explanation of why www.cs.adfa.edu.au/cec_2003 isn't   
      getting indexed, together with either
.an explanation of how to fix it
.an explanation of why it can't be fixed

.I'd certainly like to know whether we need to request removal 
      of all of  csadfa.cs.adfa.edu.au from google to fix up the 
      general problem, but I don't believe this is the only cause 
      of www.cs.adfa.edu.au/cec_2003's non-indexing (because 
      csadfa.cs.adfa.edu.au/cec_2003 isn't being indexed	either), 
      so I don't believe that this would count as an answer to the 
      question

Request for Question Clarification by aceresearcher-ga on 08 Apr 2003 12:22 PDT
Greetings, cec2003!

You don't say, so I need to ask:
Have you ever submitted your site to be indexed by the Googlebot?

Regards,

aceresearcher

Clarification of Question by cec2003-ga on 08 Apr 2003 16:39 PDT
My apologies - it's hard to think of everything to add. Yes, I have
submitted all of
www.cs.adfa.edu.au
www.cs.adfa.edu.au/cec_2003
www.cec2003.org
at least once since January (I did so once for all, I think the
webmaster did as well,
and it's possible that a couple of other people who have been involved
in trying to
work it out may also have done so). I can't remember the exact dates,
but I think
I submitted the first two in late January, and the last when it was
created, which I
think was late February. But I'm reasonably confident that
invisibility of
the site isn't our problem - googlebot has queried us a reasonable
number of times
since January  - I think the count is over 100, could check if it's
important. Unfortunately
our logging doesn't record the particular pages queried. So I guess
it's possible all of
the queries could have been to csadfa.cs.adfa.edu.au/... Even so,
those are now
returning a redirect to www.cs.adfa.edu.au. The redirect only happened
late February,
so I guess it's possible that it hasn't worked its way through the
indexing cycle yet.
Answer  
Subject: Re: Mysterious Google non-indexing
Answered By: aceresearcher-ga on 08 Apr 2003 20:39 PDT
Rated:4 out of 5 stars
 
Hi, cec2003!

There are certainly some things that you can do to get your site
indexed by Search Engines, including Google.

HOWEVER, ***your first priority should be to get the University
website's Redirect problem straightened out***.

First, I took a quick look at the links that you provided above. I
viewed your portal page, and did a "View Source" to open Notepad with
the html code for this page. I also clicked on each of the links to
the "full" and "text only" versions to take a quick look at those.

After that, one of the first things I always do when I analyze a
website to suggest improvements is to run it through the Search Engine
Spider Simulator at SearchEngineWorld.
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi

Plugging your URL http://www.cs.adfa.edu.au/cec_2003 into the
Simulator provides some REALLY disturbing results.
--------------------------------------------------------------
Spider url http://www.cs.adfa.edu.au/cec_2003 
Spider title          Congress on Evolutionary Computation 2003  
Spider meta desc      No description available. 
Spider meta keywords  
Spider Text 
The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website 
  Enter text only website The full website requres a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0

--------------------------------------------------------------
Spidered Links 
 = spider this link with current settings. 
 = keyword density analyze this link. 
Link 
    http://www.cs.adfa.edu.au/home.html 
    http://www.cs.adfa.edu.au/textonly.html 
--------------------------------------------------------------

This is really strange! The html code showing in my Notepad window
shows that you have specified a Description and Keywords, so why
doesn't the Simulator see them?
<meta name="description" content=" The Congress on Evolutionary
Computation, one of the leading international conferences in the
field, will be held in Canberra, Australia, 8th - 12th December 2003."
/>
<meta name="keywords" content="evolutionary computation, conference,
computer science, adfa" />

Do you suppose that the Simulator is being redirected to an earlier
version of your page somewhere???

Now, this is even weirder -- the two links listed by the Simulator:
    http://www.cs.adfa.edu.au/home.html 
    http://www.cs.adfa.edu.au/textonly.html 

are NOT the same URL as the two links on your page when I "mouseover"
them!
    http://www.cs.adfa.edu.au/cec_2003/home.html
    http://www.cs.adfa.edu.au/cec_2003/textonly.html

Now since the links specified in your html source code are relative
links:
    <a href="home.html">
    <a href="textonly.html">
It seems clear that the Simulator thinks that the Home page it is
looking at has the URL:
    http://www.cs.adfa.edu.au
and NOT:
    http://www.cs.adfa.edu.au/cec_2003 !

After dozens of uses, this is the first time I have EVER seen the
Simulator have this kind of a problem.

It seems apparent that there is something going seriously wrong with
the Redirect for your University's website. While human requests via a
browser show the site correctly, what the Googlebot sees is, according
to the Simulator, a page with no description, no keywords, the text:
"The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website 
  Enter text only website The full website requres a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0"

and 2 DEAD links.

No wonder you're not getting indexed. The Googlebot doesn't think that
you have any content on your site.

Before you do anything else, you need to have a conference with your
University's web administrator, and have them figure out why this is
happening. Because you state that "Our whole school web site
(www.cs.adfa.edu.au) is not getting indexed very nicely: it is
actually indexed under csadfa.cs.adfa.edu.au instead", I would have to
say that this problem is not just bolloxing up your Conference site,
it is bolloxing up ALL the University's sites, making it a VERY
serious problem that needs to be resolved as quickly as possible.

Temporary fixes to help bridge the gap until your Redirect problem is
resolved:

Instead of your portal page, try submitting the URL 
http://www.cs.adfa.edu.au/cec_2003/home.html
to Google. 
The Search Engine Spider Simulator seems to be able to crawl that page
correctly, and the links to your other pages all appear to be valid.

Also, try replacing the relative links in
http://http://www.cs.adfa.edu.au/cec_2003
    <a href="home.html">
    <a href="textonly.html">
with absolute links:
    <a href="http://www.cs.adfa.edu.au/cec_2003/home.html" 
    <a href="http://www.cs.adfa.edu.au/cec_2003/textonly.html">

(While you're at it, correct "The full website requres a CSS aware
browser." to "requires".)

Once you have gotten that pesky redirect problem resolved (and
verified that it's resolved by getting the correct results from the
Simulator), here are some things that you can do to help your site get
indexed as quickly as possible:

According to Google, 12 sites that it has indexed have links to your
Conference Page:
://www.google.com/search?q=%22%2Bwww.cs.adfa.edu.au/cec_2003%22&num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&c2coff=1&safe=off&filter=0
and their Page Rank scores are: 7,2,5,6,5,3,3,2,2,4,4,0
This is EXCELLENT. Google will index your site for "backward links"
(links from other sites) if you have 3 or more such sites with a Page
Rank of 4 or higher -- and you have 6 such sites going for you right
now! Furthermore, if you can get the Googlebot to "see" your site, you
should end up with a good Page Rank because of this, which will place
you high in Google's Search Results.

Submit your site to the Googlebot again:
://www.google.com/addurl.html 

Now, you should also submit your site to be listed in the DMOZ ODP
(Directory Mozilla - Open Directory Project) at http://www.dmoz.org .
There are two reasons for this:

1) Because DMOZ is humanly edited (each URL is personally checked out
for legitimacy before being added to the Directory), a lot of crap
that makes it into Search Engine Results never makes it into DMOZ. For
this reason, many people rely heavily on the ODP as a guide to quality
websites, so you really want to be listed in it.

2) Also for that reason, the Googlebot uses (among other things) the
ODP as a kind of "checklist" for what sites to crawl when it indexes
the web. So if you can get listed in the ODP, your chances of being
indexed -- and of obtaining a high Google Page Rank -- are GREATLY
increased.
 
My guess as to where you would choose to be submitted for cataloging
would be:
    Computers > Computer Science > Conferences > 2003
http://dmoz.org/cgi-bin/add.cgi?where=Computers/Computer_Science/Conferences/2003


You mentioned that your site is doing well on AllTheWeb (
http://www.alltheweb.com ). If you have not already been submitted and
listed in the following Search Engines, you may wish to do that as
well:

  HotBot:   
http://ldbreg.lycos.com/cgi-bin/mayaLogin?m_PR=29&m_CBURL=http://insite.lycos.com/searchservices/lite?step1.asp
 
  AltaVista:   
http://addurl.altavista.com/addurl/new   
 
  Zeal (LookSmart/MSN free submission w/free registration)   
http://www.zeal.com/users/register.jhtml   

 
For more information on developing a Google-friendly website, I
recommend that you study the information in Google's Help Department:
://www.google.com/webmasters   
Guidelines   
://www.google.com/webmasters/guidelines.html   
Facts & Fiction (myths dispelled)   
://www.google.com/webmasters/facts.html   
Search Engine Optimization (SEO)   
://www.google.com/webmasters/seo.html   
Frequently Asked Questions (FAQ)   
://www.google.com/webmasters/faq.html   
User Support Discussion Forum   
http://groups.google.com/groups?q=google.public.support.general   
  
  
Another fabulous resource is the forum at WebmasterWorld.com:   
http://www.webmasterworld.com   
   
and at Search Engine World:   
http://www.searchengineworld.com   
   
I encourage you to visit these sites and learn more about making your
site attractive and friendly to Search Engines.
  
 
Before Rating my Answer, if you have any questions about this
information, please post a Request for Clarification, and I will be
glad to see what I can do for you.
  
 
I hope that this Answer provides exactly the information that you
needed!
  
Best wishes for a quick resolution to your Redirect problem, and
greatly increased traffic to your Conference website!

Regards,

aceresearcher

Request for Answer Clarification by cec2003-ga on 09 Apr 2003 01:55 PDT
Thank you for a careful answer to the question; I really appreciate
the trouble you have
gone to. However I think we need to look more carefully at a couple of
aspects:
> Now, this is even weirder -- the two links listed by the Simulator:
>     http://www.cs.adfa.edu.au/home.html  
>     http://www.cs.adfa.edu.au/textonly.html  
>  
> are NOT the same URL as the two links on your page when I
"mouseover"
> them!
>     http://www.cs.adfa.edu.au/cec_2003/home.html 
>   http://www.cs.adfa.edu.au/cec_2003/textonly.html 

This appears to be a general problem with simspider, rather than with
our particular
apache setup. That is, if you input
url/directory_name
into simspider, it will drop 'directory_name' from any local links it
constructs, but if
you input
url/directory_name/
into simspider, it will construct the local links correctly
(I've verified this over four university sites in Australia and the UK
so far, I'm reasonably
certain that this is the case - in fact, I'm fairly sure that
simspider constructs the rootname
by truncating the _input_ url to the last '/', then adjusts for local
addresses, rather than
working from any root returned by the server; other than our own site,
the other sites
I have verified this on do not appear to have any google indexing
problems).

Funnily enough, the 'meta' issue is also a simspider bug. simspider
detects meta tags
of the form
<meta name="name" content="content">
but _not_ tags of the form
<meta name="name" content="content"/>
(ie it _relies_ on the meta tags not being terminated!). Compare
http://www.cs.adfa.edu.au/~rim/temp1.html
and 
http://www.cs.adfa.edu.au/~rim/temp2.html
_hopefully_ this bug isn't present in googlebot.

Your thoughts on the above issues would be greatly appreciated!

Clarification of Answer by aceresearcher-ga on 09 Apr 2003 03:43 PDT
However I think we need to look more carefully at a couple of aspects:
> Now, this is even weirder -- the two links listed by the Simulator:
>     http://www.cs.adfa.edu.au/home.html   
>     http://www.cs.adfa.edu.au/textonly.html   
>   
> are NOT the same URL as the two links on your page when I
> "mouseover" them! 
>     http://www.cs.adfa.edu.au/cec_2003/home.html  
>     http://www.cs.adfa.edu.au/cec_2003/textonly.html  

cec2003 Wrote: 
This appears to be a general problem with simspider, rather than with
our particular apache setup. That is, if you input 
url/directory_name 
into simspider, it will drop 'directory_name' from any local links it
constructs, but if you input 
url/directory_name/ 
into simspider, it will construct the local links correctly (I've
verified this over four university sites in Australia and the UK so
far, I'm reasonably certain that this is the case - in fact, I'm
fairly sure that simspider constructs the rootname by truncating the
_input_ url to the last '/', then adjusts for local addresses, rather
than working from any root returned by the server; other than our own
site, the other sites I have verified this on do not appear to have
any google indexing problems).
 
*****
I am not sure what you have just changed on your website -- or what
your University Web administrator has just changed -- but the results
I get from the Spider Simulator now are not what I was getting a few
hours ago.
This is what I see now:
-------------------------------------------------------------- 
Status 404 (return error code 1) 
Spider url http://www.cs.adfa.edu.au/cec2003 
User Agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.0)
68.82.222.85
Referrer http://www.searchengineworld.com/cgi-bin/sim_spider.cgi 
Spider title Error 404 Redirect 
Spider meta desc No description available. 
Spider meta keywords  
Spider Text 
Link to School of Computer Science Homepage Link to UNSW@ADFA Homepage
School of Computer Science The page you requested could not be found.
Please choose a link... New School of Computer Science website Old
website This generated link may take you to the page you were trying
to find in the old site: Please find the corresponding page in the new
site and update your bookmarks ASAP as this link will not be available
for more than a short time.   CRICOS Provider Number: 00100G dot
Copyright and Disclaimer dot Last update: Peter Morris - 3rd January
2003
-------------------------------------------------------------- 
Spidered Links  
 = spider this link with current settings.  
 = keyword density analyze this link.  
Link  
    http://www.cs.adfa.edu.au/ 
    http://www.unsw.adfa.edu.au/ 
    http://www.cs.adfa.edu.au/index.html 
    http://www.cs.adfa.edu.au/archive/index.html 
    http://www.cs.adfa.edu.au/'+newurl+' 
    http://www.cs.adfa.edu.au/copyright.html 
-------------------------------------------------------------- 
OOPS!

Now this has JUST changed again, to:

-------------------------------------------------------------- 
Status 200 (return error code 0) 
Spider url http://www.cs.adfa.edu.au/cec_2003/ 
User Agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.0)
68.82.222.85
Referrer http://www.searchengineworld.com/cgi-bin/sim_spider.cgi 
Spider title Congress on Evolutionary Computation 2003  
Spider meta desc No description available. 
Spider meta keywords  
Spider Text 
The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website 
  Enter text only website The full website requires a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0
-------------------------------------------------------------- 
Spidered Links 
 = spider this link with current settings. 
 = keyword density analyze this link. 
Link  (for http://www.cs.adfa.edu.au/cec_2003/ )
    http://www.cs.adfa.edu.au/cec_2003/home.html 
    http://www.cs.adfa.edu.au/cec_2003/textonly.html 

Link  (for http://www.cs.adfa.edu.au/cec_2003  )
    http://www.cs.adfa.edu.au/home.html 
    http://www.cs.adfa.edu.au/textonly.html 
-------------------------------------------------------------- 
It will be extremely difficult to analyze or diagnose a problem while
your website and/or web admin setup are being modified.

While the difference between the URL with the ending backslash and the
URL without it *may* be a problem with the Spider Simulator, I am more
inclined to believe that it is part and parcel with all the other
problems you are having with referring. I tried this with a University
site here in the States, and I get exactly the same results whether I
include a backslash at the end of the subdirectory or not. Can you
provide me with the examples of the other University sites that you
say showed this same problem?

cec2003 Wrote: 
Funnily enough, the 'meta' issue is also a simspider bug. simspider
detects meta tags of the form 
<meta name="name" content="content"> 
but _not_ tags of the form 
<meta name="name" content="content"/> 
(ie it _relies_ on the meta tags not being terminated!). Compare 
http://www.cs.adfa.edu.au/~rim/temp1.html 
and  
http://www.cs.adfa.edu.au/~rim/temp2.html 
_hopefully_ this bug isn't present in googlebot. 

*****
This isn't a bug with the Simulator. The accepted Standard for <meta>
tags does not include a "/" at the end; with the hundreds of pages of
html code that I have looked at, I have never seen this done. I am not
sure what syntax guide was used to define these meta tags which
specifies the "/" at the end; if you can point me to it, I will be
glad to take a look at it. However, I am pretty confident that if you
change your syntax to the accepted Standard, the Simulator and other
Spiders will have no problem recognizing your <meta> tags.

Regards,

aceresearcher

Request for Answer Clarification by cec2003-ga on 11 Apr 2003 01:46 PDT
Thanks for your trouble again on this; here are clarifications on a couple of the
issues you raised (sorry about the delay - problems with Oz/US time zones)

>Status 404 (return error code 1)  
>Spider url http://www.cs.adfa.edu.au/cec2003

This may be the result of a typo (cec2003 for cec_2003)

>I tried this with a University
>site here in the States, and I get exactly the same results whether I
>include a backslash at the end of the subdirectory or not. Can you
>provide me with the examples of the other University sites that you
>say showed this same problem?

Here are a couple (random selection from our conference committee). Remember
that the problem only arises with relative URL addresses
http://www.cs.bham.ac.uk/~xin/
http://www.cse.unsw.edu.au/~blair/

>This isn't a bug with the Simulator. The accepted Standard for <meta>
>tags does not include a "/" at the end; with the hundreds of pages of
>html code that I have looked at, I have never seen this done. I am not
>sure what syntax guide was used to define these meta tags which
>specifies the "/" at the end; if you can point me to it, I will be
>glad to take a look at it. However, I am pretty confident that if you
>change your syntax to the accepted Standard, the Simulator and other
>Spiders will have no problem recognizing your <meta> tags.

I'm surprised; <meta/> is valid XHTML, whereas <meta> without a corresponding
</meta> isn't. However we are switching everything to <meta></meta> as per your
suggestion, and it does seem that it helps simspider.

Clarification of Answer by aceresearcher-ga on 13 Apr 2003 12:51 PDT
cec2003,

Please bear with me a little bit. I live in the U.S. and April 15 is
our big Tax day.

Thanks for your patience and understanding,

ace

Clarification of Answer by aceresearcher-ga on 15 Apr 2003 12:18 PDT
Hello again, cec2003!

Thanks for your patience. I just found out that I totally bolloxed up
my tax returns, so I have to completely redo them. #-[


>Status 404 (return error code 1)   
>Spider url http://www.cs.adfa.edu.au/cec2003 
 
This may be the result of a typo (cec2003 for cec_2003)

You're right. I was obviously up WAY past my bedtime. Sorry about
that.


>This isn't a bug with the Simulator. The accepted Standard for <meta>
>tags does not include a "/" at the end; with the hundreds of pages of
>html code that I have looked at, I have never seen this done. I am
not
>sure what syntax guide was used to define these meta tags which 
>specifies the "/" at the end; if you can point me to it, I will be 
>glad to take a look at it. However, I am pretty confident that if you
>change your syntax to the accepted Standard, the Simulator and other
>Spiders will have no problem recognizing your <meta> tags. 
 
I'm surprised; <meta/> is valid XHTML, whereas <meta> without a
corresponding
</meta> isn't. However we are switching everything to <meta></meta> as
per your
suggestion, and it does seem that it helps simspider.

I did some in-depth reading about the XHTML Standards at The World
Wide Web Consortium ( http://www.w3.org ), and you are also correct
about this, although a lot of people don't seem to be doing it yet,
which is why hadn't ever seen it before.

I wondered if the SearchEngineWorld Spider Simulator
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
might have some quirks related to the XHTML Standard, so I did some
checking at the companion forums on WebmasterWorld.

I found this thread:

TallTroll: msg #:5 11:15 am on Nov 2, 2001
"Funny you should ask. I had EXACTLY the same problems with it
recently. Here is what I found out
1) The Simspider is VERY picky. The formatting of the tag must be
perfect, no "illegal" spaces. I had a tag which ended...

...keyword1,keyword2" > 

and it was rejected. Also, I had another which started 

<META name="keywords" content = "Keyword1,keyword2.... 

That was rejected too. Changed it to 

<META name="keywords" content="Keyword1,keyword2.... 

and it all worked again. 

2) You have to list the tags in the correct order, description, THEN
keywords, or it won't see keywords.

I also have had one VERY silly mistake where I had somehow put a
<BODY> tag before the <HEAD> tag. Although the Simspider saw the
<TITLE>, it ignored the (correct) <META> tags.

Most likely explanation is 2) I spent a couple of frustrating hours
cursing the Simspider before I twigged that one :)

Once you know the rules, everythings great.... 

I'm not sure why it does this, I was trying to figure out if it was a
requirement of one of the "strict" DTDs or something, but I couldn't
find anything. Maybe Brett could enlighten us there?"
 
Will: msg #:6 6:06 pm on Nov 2, 2001
"A valid document using the XHTML Transitional DTD needs to have any
keyword or description tags appear **before** the title tag, i.e.
anything with META in it comes before title, linked/inline stylesheets
etc."
http://www.webmasterworld.com/forum11/957.htm


I also found this thread:

VictorE: msg#:7 3:45 am on Jan 22, 2003
"I believe this topic has been covered before. I think it is simply a
small "feature" of the Simspider. Try changing the order of your Meta
description and your Meta keywords. One of the permutations should
work. I believe for them to show on Simspider your keywords must be
after your description. Either way should be valid, but only one of
the ways will allow the keywords to be seen by Simspider.
Vic"

pageoneresults: msg #:8 3:50 am on Jan 22, 2003  
"VictorE, excellent catch. The page I've been checking is in order
like this...
title 
description 
keywords 

As soon as I dropped the keywords tag above the description tag, the
keywords tag was not seen by the spider."
http://www.webmasterworld.com/forum8/525.htm


You have an extraneous space in front of your Description Content. You
might try removing that, and possibly switch your Keywords and
Description metatags around, to see if the SimSpider does a better job
with your page.


TallTroll: msg #:10 3:43 pm on Jan 16, 2002  
"True, I'm just curious as to why the Simspider isn't reading a
perfectly good (at least it LOOKS perfectly good) relative link on
this other site correctly. It insists it is being pointed to
"http://navigation.html". I just find it odd that this site ALSO has a
PR of 0, AND the effect of the fault would be to make virtually the
entire site unreachable, if an SE spider behaves the same way as the
Simspider.

<a little later> 

Ding, got it! 

spidering www.domain.co.uk gets the link wrong 
spidering www.domain.co.uk/ gets the link right 

I bet Simspider constructs relative paths from the entered URL (not
unreasonably), by just tagging the filename called on the end of the
base URL without checking for a terminal /. Also, entering
.../index.html gets it right, so I bet its smart enough to sub the
filename correctly, cos it'll have a / there."
http://www.webmasterworld.com/forum27/233.htm


So, it sounds like the SimSpider may not be reading your HTML/XHTML
code quite properly, as well as having problems with referrers.

Is there a reason you're redirecting your site the way that you are?
Wouldn't you want to place your real pages at the real address, and
put the referrer at the old address, in case someone still tries to
access the pages at their former URLs? I think that that would help to
ensure that Google doesn't have any problem indexing your new pages
properly.

At any rate, it is entirely possible that your website results could
start showing up properly within the next month. It sometimes takes
2-3 months for the Googlebot to get everything indexed properly, and
with all the switching around of your URLs, it may take quite awhile
for the Googlebot to get it straightened out.

If you get your website listed in the DMOZ ODP as I suggested, that
should really help you in terms of Search Results.

Best Wishes!

ace
cec2003-ga rated this answer:4 out of 5 stars
aceresearcher-ga has put a considerable amount of work into answering
this question,
and provided a number of very useful leads. The answers don't fully
answer the
question, but it may well be that the question could only be answered
fully by a
googlebot software developer.

Comments  
Subject: Re: Mysterious Google non-indexing
From: aceresearcher-ga on 21 Apr 2003 23:03 PDT
 
...and they ain't telling!

Thanks cec2003 -- I hope that you are able to get all the referrer
issues straightened out so that all your University's web pages start
showing up well in Google Search Results, and that your Conference
gets its DMOZ listing, which I am sure will help.

Best Wishes!

ace

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy