Hi, cec2003!
There are certainly some things that you can do to get your site
indexed by Search Engines, including Google.
HOWEVER, ***your first priority should be to get the University
website's Redirect problem straightened out***.
First, I took a quick look at the links that you provided above. I
viewed your portal page, and did a "View Source" to open Notepad with
the html code for this page. I also clicked on each of the links to
the "full" and "text only" versions to take a quick look at those.
After that, one of the first things I always do when I analyze a
website to suggest improvements is to run it through the Search Engine
Spider Simulator at SearchEngineWorld.
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Plugging your URL http://www.cs.adfa.edu.au/cec_2003 into the
Simulator provides some REALLY disturbing results.
--------------------------------------------------------------
Spider url http://www.cs.adfa.edu.au/cec_2003
Spider title Congress on Evolutionary Computation 2003
Spider meta desc No description available.
Spider meta keywords
Spider Text
The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website
Enter text only website The full website requres a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0
--------------------------------------------------------------
Spidered Links
= spider this link with current settings.
= keyword density analyze this link.
Link
http://www.cs.adfa.edu.au/home.html
http://www.cs.adfa.edu.au/textonly.html
--------------------------------------------------------------
This is really strange! The html code showing in my Notepad window
shows that you have specified a Description and Keywords, so why
doesn't the Simulator see them?
<meta name="description" content=" The Congress on Evolutionary
Computation, one of the leading international conferences in the
field, will be held in Canberra, Australia, 8th - 12th December 2003."
/>
<meta name="keywords" content="evolutionary computation, conference,
computer science, adfa" />
Do you suppose that the Simulator is being redirected to an earlier
version of your page somewhere???
Now, this is even weirder -- the two links listed by the Simulator:
http://www.cs.adfa.edu.au/home.html
http://www.cs.adfa.edu.au/textonly.html
are NOT the same URL as the two links on your page when I "mouseover"
them!
http://www.cs.adfa.edu.au/cec_2003/home.html
http://www.cs.adfa.edu.au/cec_2003/textonly.html
Now since the links specified in your html source code are relative
links:
<a href="home.html">
<a href="textonly.html">
It seems clear that the Simulator thinks that the Home page it is
looking at has the URL:
http://www.cs.adfa.edu.au
and NOT:
http://www.cs.adfa.edu.au/cec_2003 !
After dozens of uses, this is the first time I have EVER seen the
Simulator have this kind of a problem.
It seems apparent that there is something going seriously wrong with
the Redirect for your University's website. While human requests via a
browser show the site correctly, what the Googlebot sees is, according
to the Simulator, a page with no description, no keywords, the text:
"The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website
Enter text only website The full website requres a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0"
and 2 DEAD links.
No wonder you're not getting indexed. The Googlebot doesn't think that
you have any content on your site.
Before you do anything else, you need to have a conference with your
University's web administrator, and have them figure out why this is
happening. Because you state that "Our whole school web site
(www.cs.adfa.edu.au) is not getting indexed very nicely: it is
actually indexed under csadfa.cs.adfa.edu.au instead", I would have to
say that this problem is not just bolloxing up your Conference site,
it is bolloxing up ALL the University's sites, making it a VERY
serious problem that needs to be resolved as quickly as possible.
Temporary fixes to help bridge the gap until your Redirect problem is
resolved:
Instead of your portal page, try submitting the URL
http://www.cs.adfa.edu.au/cec_2003/home.html
to Google.
The Search Engine Spider Simulator seems to be able to crawl that page
correctly, and the links to your other pages all appear to be valid.
Also, try replacing the relative links in
http://http://www.cs.adfa.edu.au/cec_2003
<a href="home.html">
<a href="textonly.html">
with absolute links:
<a href="http://www.cs.adfa.edu.au/cec_2003/home.html"
<a href="http://www.cs.adfa.edu.au/cec_2003/textonly.html">
(While you're at it, correct "The full website requres a CSS aware
browser." to "requires".)
Once you have gotten that pesky redirect problem resolved (and
verified that it's resolved by getting the correct results from the
Simulator), here are some things that you can do to help your site get
indexed as quickly as possible:
According to Google, 12 sites that it has indexed have links to your
Conference Page:
://www.google.com/search?q=%22%2Bwww.cs.adfa.edu.au/cec_2003%22&num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&c2coff=1&safe=off&filter=0
and their Page Rank scores are: 7,2,5,6,5,3,3,2,2,4,4,0
This is EXCELLENT. Google will index your site for "backward links"
(links from other sites) if you have 3 or more such sites with a Page
Rank of 4 or higher -- and you have 6 such sites going for you right
now! Furthermore, if you can get the Googlebot to "see" your site, you
should end up with a good Page Rank because of this, which will place
you high in Google's Search Results.
Submit your site to the Googlebot again:
://www.google.com/addurl.html
Now, you should also submit your site to be listed in the DMOZ ODP
(Directory Mozilla - Open Directory Project) at http://www.dmoz.org .
There are two reasons for this:
1) Because DMOZ is humanly edited (each URL is personally checked out
for legitimacy before being added to the Directory), a lot of crap
that makes it into Search Engine Results never makes it into DMOZ. For
this reason, many people rely heavily on the ODP as a guide to quality
websites, so you really want to be listed in it.
2) Also for that reason, the Googlebot uses (among other things) the
ODP as a kind of "checklist" for what sites to crawl when it indexes
the web. So if you can get listed in the ODP, your chances of being
indexed -- and of obtaining a high Google Page Rank -- are GREATLY
increased.
My guess as to where you would choose to be submitted for cataloging
would be:
Computers > Computer Science > Conferences > 2003
http://dmoz.org/cgi-bin/add.cgi?where=Computers/Computer_Science/Conferences/2003
You mentioned that your site is doing well on AllTheWeb (
http://www.alltheweb.com ). If you have not already been submitted and
listed in the following Search Engines, you may wish to do that as
well:
HotBot:
http://ldbreg.lycos.com/cgi-bin/mayaLogin?m_PR=29&m_CBURL=http://insite.lycos.com/searchservices/lite?step1.asp
AltaVista:
http://addurl.altavista.com/addurl/new
Zeal (LookSmart/MSN free submission w/free registration)
http://www.zeal.com/users/register.jhtml
For more information on developing a Google-friendly website, I
recommend that you study the information in Google's Help Department:
://www.google.com/webmasters
Guidelines
://www.google.com/webmasters/guidelines.html
Facts & Fiction (myths dispelled)
://www.google.com/webmasters/facts.html
Search Engine Optimization (SEO)
://www.google.com/webmasters/seo.html
Frequently Asked Questions (FAQ)
://www.google.com/webmasters/faq.html
User Support Discussion Forum
http://groups.google.com/groups?q=google.public.support.general
Another fabulous resource is the forum at WebmasterWorld.com:
http://www.webmasterworld.com
and at Search Engine World:
http://www.searchengineworld.com
I encourage you to visit these sites and learn more about making your
site attractive and friendly to Search Engines.
Before Rating my Answer, if you have any questions about this
information, please post a Request for Clarification, and I will be
glad to see what I can do for you.
I hope that this Answer provides exactly the information that you
needed!
Best wishes for a quick resolution to your Redirect problem, and
greatly increased traffic to your Conference website!
Regards,
aceresearcher |
Clarification of Answer by
aceresearcher-ga
on
09 Apr 2003 03:43 PDT
However I think we need to look more carefully at a couple of aspects:
> Now, this is even weirder -- the two links listed by the Simulator:
> http://www.cs.adfa.edu.au/home.html
> http://www.cs.adfa.edu.au/textonly.html
>
> are NOT the same URL as the two links on your page when I
> "mouseover" them!
> http://www.cs.adfa.edu.au/cec_2003/home.html
> http://www.cs.adfa.edu.au/cec_2003/textonly.html
cec2003 Wrote:
This appears to be a general problem with simspider, rather than with
our particular apache setup. That is, if you input
url/directory_name
into simspider, it will drop 'directory_name' from any local links it
constructs, but if you input
url/directory_name/
into simspider, it will construct the local links correctly (I've
verified this over four university sites in Australia and the UK so
far, I'm reasonably certain that this is the case - in fact, I'm
fairly sure that simspider constructs the rootname by truncating the
_input_ url to the last '/', then adjusts for local addresses, rather
than working from any root returned by the server; other than our own
site, the other sites I have verified this on do not appear to have
any google indexing problems).
*****
I am not sure what you have just changed on your website -- or what
your University Web administrator has just changed -- but the results
I get from the Spider Simulator now are not what I was getting a few
hours ago.
This is what I see now:
--------------------------------------------------------------
Status 404 (return error code 1)
Spider url http://www.cs.adfa.edu.au/cec2003
User Agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.0)
68.82.222.85
Referrer http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Spider title Error 404 Redirect
Spider meta desc No description available.
Spider meta keywords
Spider Text
Link to School of Computer Science Homepage Link to UNSW@ADFA Homepage
School of Computer Science The page you requested could not be found.
Please choose a link... New School of Computer Science website Old
website This generated link may take you to the page you were trying
to find in the old site: Please find the corresponding page in the new
site and update your bookmarks ASAP as this link will not be available
for more than a short time. CRICOS Provider Number: 00100G dot
Copyright and Disclaimer dot Last update: Peter Morris - 3rd January
2003
--------------------------------------------------------------
Spidered Links
= spider this link with current settings.
= keyword density analyze this link.
Link
http://www.cs.adfa.edu.au/
http://www.unsw.adfa.edu.au/
http://www.cs.adfa.edu.au/index.html
http://www.cs.adfa.edu.au/archive/index.html
http://www.cs.adfa.edu.au/'+newurl+'
http://www.cs.adfa.edu.au/copyright.html
--------------------------------------------------------------
OOPS!
Now this has JUST changed again, to:
--------------------------------------------------------------
Status 200 (return error code 0)
Spider url http://www.cs.adfa.edu.au/cec_2003/
User Agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.0)
68.82.222.85
Referrer http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Spider title Congress on Evolutionary Computation 2003
Spider meta desc No description available.
Spider meta keywords
Spider Text
The Congress on Evolutionary Computation, one of the leading
international conferences in the field, will be held in Canberra,
Australia, 8th - 12th December 2003. CEC 2003 logo Enter full website
Enter text only website The full website requires a CSS aware
browser. Recommended oldest browsers: Internet Explorer v6.0, Netscape
v6.1, Mozilla v1.1, Opera v6.0
--------------------------------------------------------------
Spidered Links
= spider this link with current settings.
= keyword density analyze this link.
Link (for http://www.cs.adfa.edu.au/cec_2003/ )
http://www.cs.adfa.edu.au/cec_2003/home.html
http://www.cs.adfa.edu.au/cec_2003/textonly.html
Link (for http://www.cs.adfa.edu.au/cec_2003 )
http://www.cs.adfa.edu.au/home.html
http://www.cs.adfa.edu.au/textonly.html
--------------------------------------------------------------
It will be extremely difficult to analyze or diagnose a problem while
your website and/or web admin setup are being modified.
While the difference between the URL with the ending backslash and the
URL without it *may* be a problem with the Spider Simulator, I am more
inclined to believe that it is part and parcel with all the other
problems you are having with referring. I tried this with a University
site here in the States, and I get exactly the same results whether I
include a backslash at the end of the subdirectory or not. Can you
provide me with the examples of the other University sites that you
say showed this same problem?
cec2003 Wrote:
Funnily enough, the 'meta' issue is also a simspider bug. simspider
detects meta tags of the form
<meta name="name" content="content">
but _not_ tags of the form
<meta name="name" content="content"/>
(ie it _relies_ on the meta tags not being terminated!). Compare
http://www.cs.adfa.edu.au/~rim/temp1.html
and
http://www.cs.adfa.edu.au/~rim/temp2.html
_hopefully_ this bug isn't present in googlebot.
*****
This isn't a bug with the Simulator. The accepted Standard for <meta>
tags does not include a "/" at the end; with the hundreds of pages of
html code that I have looked at, I have never seen this done. I am not
sure what syntax guide was used to define these meta tags which
specifies the "/" at the end; if you can point me to it, I will be
glad to take a look at it. However, I am pretty confident that if you
change your syntax to the accepted Standard, the Simulator and other
Spiders will have no problem recognizing your <meta> tags.
Regards,
aceresearcher
|
Clarification of Answer by
aceresearcher-ga
on
15 Apr 2003 12:18 PDT
Hello again, cec2003!
Thanks for your patience. I just found out that I totally bolloxed up
my tax returns, so I have to completely redo them. #-[
>Status 404 (return error code 1)
>Spider url http://www.cs.adfa.edu.au/cec2003
This may be the result of a typo (cec2003 for cec_2003)
You're right. I was obviously up WAY past my bedtime. Sorry about
that.
>This isn't a bug with the Simulator. The accepted Standard for <meta>
>tags does not include a "/" at the end; with the hundreds of pages of
>html code that I have looked at, I have never seen this done. I am
not
>sure what syntax guide was used to define these meta tags which
>specifies the "/" at the end; if you can point me to it, I will be
>glad to take a look at it. However, I am pretty confident that if you
>change your syntax to the accepted Standard, the Simulator and other
>Spiders will have no problem recognizing your <meta> tags.
I'm surprised; <meta/> is valid XHTML, whereas <meta> without a
corresponding
</meta> isn't. However we are switching everything to <meta></meta> as
per your
suggestion, and it does seem that it helps simspider.
I did some in-depth reading about the XHTML Standards at The World
Wide Web Consortium ( http://www.w3.org ), and you are also correct
about this, although a lot of people don't seem to be doing it yet,
which is why hadn't ever seen it before.
I wondered if the SearchEngineWorld Spider Simulator
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
might have some quirks related to the XHTML Standard, so I did some
checking at the companion forums on WebmasterWorld.
I found this thread:
TallTroll: msg #:5 11:15 am on Nov 2, 2001
"Funny you should ask. I had EXACTLY the same problems with it
recently. Here is what I found out
1) The Simspider is VERY picky. The formatting of the tag must be
perfect, no "illegal" spaces. I had a tag which ended...
...keyword1,keyword2" >
and it was rejected. Also, I had another which started
<META name="keywords" content = "Keyword1,keyword2....
That was rejected too. Changed it to
<META name="keywords" content="Keyword1,keyword2....
and it all worked again.
2) You have to list the tags in the correct order, description, THEN
keywords, or it won't see keywords.
I also have had one VERY silly mistake where I had somehow put a
<BODY> tag before the <HEAD> tag. Although the Simspider saw the
<TITLE>, it ignored the (correct) <META> tags.
Most likely explanation is 2) I spent a couple of frustrating hours
cursing the Simspider before I twigged that one :)
Once you know the rules, everythings great....
I'm not sure why it does this, I was trying to figure out if it was a
requirement of one of the "strict" DTDs or something, but I couldn't
find anything. Maybe Brett could enlighten us there?"
Will: msg #:6 6:06 pm on Nov 2, 2001
"A valid document using the XHTML Transitional DTD needs to have any
keyword or description tags appear **before** the title tag, i.e.
anything with META in it comes before title, linked/inline stylesheets
etc."
http://www.webmasterworld.com/forum11/957.htm
I also found this thread:
VictorE: msg#:7 3:45 am on Jan 22, 2003
"I believe this topic has been covered before. I think it is simply a
small "feature" of the Simspider. Try changing the order of your Meta
description and your Meta keywords. One of the permutations should
work. I believe for them to show on Simspider your keywords must be
after your description. Either way should be valid, but only one of
the ways will allow the keywords to be seen by Simspider.
Vic"
pageoneresults: msg #:8 3:50 am on Jan 22, 2003
"VictorE, excellent catch. The page I've been checking is in order
like this...
title
description
keywords
As soon as I dropped the keywords tag above the description tag, the
keywords tag was not seen by the spider."
http://www.webmasterworld.com/forum8/525.htm
You have an extraneous space in front of your Description Content. You
might try removing that, and possibly switch your Keywords and
Description metatags around, to see if the SimSpider does a better job
with your page.
TallTroll: msg #:10 3:43 pm on Jan 16, 2002
"True, I'm just curious as to why the Simspider isn't reading a
perfectly good (at least it LOOKS perfectly good) relative link on
this other site correctly. It insists it is being pointed to
"http://navigation.html". I just find it odd that this site ALSO has a
PR of 0, AND the effect of the fault would be to make virtually the
entire site unreachable, if an SE spider behaves the same way as the
Simspider.
<a little later>
Ding, got it!
spidering www.domain.co.uk gets the link wrong
spidering www.domain.co.uk/ gets the link right
I bet Simspider constructs relative paths from the entered URL (not
unreasonably), by just tagging the filename called on the end of the
base URL without checking for a terminal /. Also, entering
.../index.html gets it right, so I bet its smart enough to sub the
filename correctly, cos it'll have a / there."
http://www.webmasterworld.com/forum27/233.htm
So, it sounds like the SimSpider may not be reading your HTML/XHTML
code quite properly, as well as having problems with referrers.
Is there a reason you're redirecting your site the way that you are?
Wouldn't you want to place your real pages at the real address, and
put the referrer at the old address, in case someone still tries to
access the pages at their former URLs? I think that that would help to
ensure that Google doesn't have any problem indexing your new pages
properly.
At any rate, it is entirely possible that your website results could
start showing up properly within the next month. It sometimes takes
2-3 months for the Googlebot to get everything indexed properly, and
with all the switching around of your URLs, it may take quite awhile
for the Googlebot to get it straightened out.
If you get your website listed in the DMOZ ODP as I suggested, that
should really help you in terms of Search Results.
Best Wishes!
ace
|