Google Answers Logo
View Question
 
Q: Search result displayed URL's oddity ( Answered 1 out of 5 stars,   0 Comments )
Question  
Subject: Search result displayed URL's oddity
Category: Computers > Internet
Asked by: chametro-ga
List Price: $50.00
Posted: 15 Jul 2004 09:55 PDT
Expires: 14 Aug 2004 09:55 PDT
Question ID: 374519
I'm working with a firm where one of the project objectives is to remove 
session ID's from their web page URL's. Their public page URL's now have 
certain name-value pairs appended to them as a query, for example:

   PREURL=e_home&CFID=957450&CFTOKEN=74716046

They are altering the architecture and are removing CFID and CFTOKEN from the 
URL's of public web pages, however the PREURL name-value pair will remain. 
They've asked if this will provide a major boost in their page rankings,
or if the PREURL name-value pair will blunt all improvement (PREURL
::= name previous web page visited). They've been including the PREURL
variable for about one year and it is tied into their processes. They
(and I) are "SEO aware", and do understand the basics of Google page
ranking.

I've tentatively told them that removing the session data should help
considerably, but I'm unsure what the effect of leaving the PREURL
value will
be. 

Researching their Google-indexed URL's I came across the extraordinary fact 
that ONLY one of their URL's containing the PREURL variable is in the
index (about 3010 "pages" that contain CFID & CFTOKEN are indexed,
however).

--- research summary start -------------------------------------------------
Google Search Results:

google search field entry >> inurl:PREURL site:www.execunet.com
returns 1 result
the results page address >>
www.execunet.com/e_network.cfm?PREURL=e_home&CFID=957450&CFTOKEN=74716046

google search field entry >> inurl:CFID site:www.execunet.com
returns >> ~3010 results
example results page address >>
www.execunet.com/m_home.cfm?CFID=1321599&
CFTOKEN=7ca2194bd1876c8a-6AEC24C3-A37C-9F38-44D4E7BEC594D88B
--- research summary end  --------------------------------------------------

I've looked at the cached pages in the Google index. The a-href links in the 
pages' html do include the PREURL variable - however the displayed Google-
indexed pages' URL's do not (with the exception of the one page referenced 
above).

The firm prefers it not be present in the displayed search results,
but I'm guessing that the "disappearing PREURL" effect is unintended &
serendipitous.

My question:  Why is the Google index droppping/excluding the PREURL name-value 
pair from the indexed URL's that are being displayed in search
results, and will the manner in which the above referenced URL's are
now indexed/displayed in Google results continue for the foreseeable
future?

Request for Question Clarification by webadept-ga on 15 Jul 2004 17:29 PDT
Hi, I just want to clarify something with you here. You are "cleaning"
Dynamic links and you believe that one type of dynamic link is better
than another? Is that what we are talking here?

Check this out.. go out there and search for 
inurl:chm site:www.execunet.com 

Thats just going to drive you nuts. better than giving someone a
chineese finger puzzle.

webadept-ga

Clarification of Question by chametro-ga on 15 Jul 2004 18:24 PDT
Re: the first request for clarification...

    ...I just want to clarify something with you here. You are "cleaning"
    Dynamic links and you believe that one type of dynamic link is better
    than another? Is that what we are talking here?

In the broadest sense, yes. I cannot go into a lot of detail, but "nirvana" 
would be to remove the query portion of the URL's entirely (per the
recommendations of several paid SEO consultants). That cannot be done
at
this time.

However, everything but the PREURL name-value pair can be removed from the 
public pages. This will dramatically cut down the number of "pages" indexed 
and will increase the "popularity" measure of the second, third, forth... 
levels of the public pages.

Firm was all set to go when the weird results that I provided in the question's 
preamble appeared in a test search. So I've posted the question in hopes that 
someone knows the inner workings of the Google spiders/indexing methods and 
can explain why the PREURL name-vaule pair is missing from all but one of 
the 3000 plus indexed "page" URL's. 

    Check this out.. go out there and search for 
    inurl:chm site:www.execunet.com 

I tried the above at Google (& similar search at AllTheWeb) and got zero 
results at both engines. Typo?
Answer  
Subject: Re: Search result displayed URL's oddity
Answered By: webadept-ga on 18 Jul 2004 13:10 PDT
Rated:1 out of 5 stars
 
Hi, 

It took me several read throughs to get this, but I finaly did. So,
I'm goig to answer this for you, as clearly as I can, taking much of
the tech fluff out of it, and I'll explain why in a bit. Basically,
you are right, but the way you said you were right had me thinking you
were dead wrong. So, clear your mind of what you think you know for a
moment, and read this through, if you have other questoins regarding
this, please feel free to use the Clarification button and I'll see
that you get your answers. But, lets' start here first.

First off, it is popular now in the SEO arena to suggest that dynamic
links of all kinds are not indexed by Google, when in fact, Google's
own SEO page, says they do this just fine, and if you look at the
address of the pages Google uses, they are just as dynamic as everyone
else's.

Secondly, if someone were to suggest to me that PREURL is not indexing
as well as CFID (or visa versa) my first thought would be that the
PREURL pages are not as static as the CFID pages. Meaning, that when
the Google bot returns, it cant' find the same pages it did the last
time it was here.. and really, that's all it cares about. If it found
a page that it thought noteworthy enough to index, it wants to find
that page on its next trip out to your site. If it doesn't find these
pages, then they drop out of the index, as not being stable enough to
keep in the data files.

Third, the utility you are using to test this is just that, a utility,
a game, a thing to get an idea with, and shouldn't be confused with a
diagnostic tool. Cause its not and was never intended to be a
diagnostic tool for your SEO program. There are times (quite a few
really) that the site: inurl: and linked: do reflect what the main
engine says, there are quite a few other times they do not.

Much of this has to do with the massive updating required to keep
those tools, and the main engine in sync. Much of it also has to do
with the Google Dance, and the monthly updates to the main engines,
which span over weeks in time as the main engines across the globe
sync with each other.

A great deal more has to do with the well known fact that the only
ones really using the site: search are people searching their own
sites. So, its not as important to keep accurate as the main engine
is.

I went over their website. They have a very good ratio of indexed
pages, and no doubt this has come from many long hours of detailed
attention. They also have a great deal of content there, which is the
biggest variable to keep working. There are several factors involved
with the Google engine (and not just them, all of the engines have
their quirks these days), But, really.. consider what you are
suggesting here... It would take 'extra' code in the Google bot to
have the preference you are suggesting.

What is happening there with a PREURL code is not the code itself, but
the information it is relating. They are using this as a marker for
the last page I was on, while going through the site, instead of using
sessions or a cookie or something of that nature. If they remove this
code, but continue to put this information (last page URL) in the
string, they will have the same results, because these pages won't be
there on the next time the bot comes through. If they remove the other
codes and keep this one, ... the net result will be 0 as well. .. No,
that's not really true, They will have still changed every url on
every page on their site, so they will probably loose a great deal of
indexed pages for a while. If they are moving to a schema that results
in more stable URL's then it is worth it, if they are not, then, they
should probably reconsider their logic.

If they change these to static types using MOD_Rewrite, they will
still have the same problem.

The main thing, the number one thing, which is important to any search
engine, not just Google, is the longevity of the page. The question is
"will it exist tomorrow?" and in pages we see here, the answer is
almost always, "no". So they will not be indexed, not for long anyway.

Consistency and content are the two main factors in Google indexing.
There's not much mystery involved. You build a good site, with good
content and keep it up and consistent, then you rank high. If you
change URL's constantly, have 100's of URL's pointing to the same URL,
(which happens by default with their type of setup, because the bot
comes in different routes, and finds later that it has several
different URL's pointing to the exact same content ) then.. you don't.
It doesn't take highly paid SEO's to tell you that.

Think of it this way, you have two friends who give you great advice.
One, has a single number, and every time you call it, he's there to
answer the question. The other, well.. he's not always at the same
number, and sometimes, he's not anywhere. Over the course of a couple
of months, who are you most likely to call on a regular basis?

Now, as for your question, the answer is no, and I've explained it
rather well I think, but not for the reasons you have started out
with. Any change to the URL's will affect them in the short run.
Expect them to drop quite a bit with a site wide change like that.
Keeping the PREURL tag in there, is .. well, ludicrous really, (why
change at all?) Having dynamically created links is not a problem.
What is a problem is creating links that will not be created again, or
.. inconsistently created in the future. There are better, more
reliable, and much more accurate methods of tracking user progress
through a site.

So, yes, your are right and being very consensus in pointing this out to them. 

I've explained this in very simple term in this answer, because,
although you were on the right track, it took me several read through
to realize what it was your were getting at, and where you were coming
from. So, this simple way of explaining it is also to give you a
method, of relating your ideas in a context that your clients will
understand as well.

You and they need to realize that in the area you are addressing,
'deep inner workings of the Google engine' are not necessary to see
and understand basic facts of life. Google, and every other engine,
has a limited amount of space and has to keep the engine as clean as
possible to get results from that base as fast as possible. That is
reality. There is no getting around that.

Second, they (Search Engines) want as many different results for a
give query, which relates to that query, as possible. Finding that a
search (any search) shows 100's of links to the exact same content, is
frustrating to the user and embarrassing for the SE. So, when they
discover sites which create this phenomena, they remove them, or
filter them down heavily. Again, no secret here, just basic business.

Third, a simple javascript cookie placed into the body of the page,
would solve this. In fact they are missing the simple basics. Like a
site map.

http://www.execunet.com/sitemap.html

 Google will use that page to re-index your site. It doesn't care that
those links in that file are dynamic. All it cares about is that the
page is there when it comes back next week. It also cares about the
content an that there is something meaningful there, but that's
another topic, and one we aren't ready to address at this point.

The don't' have a robots.txt file, to help the bot know where to go
and what to do when it is there. It might be in your HEADER tag, but
that's not where its going to look for it for 'site constancy'.
http://www.execunet.com/robots.txt

All of this is on the Google main Page, it's not hidden information,
and doesn't require a highly paid SEO to gather it up for them. (Or
maybe it does. It seems that more and more businesses would rather pay
than play these days).

://www.google.com/webmasters/

://www.google.com/webmasters/4.html

://www.google.com/webmasters/3.html

Quoate--"Fiction: Sites are not included in Google's index if they use
ASP (or some other non-html file-type.)
Fact: At Google, we are able to index most types of pages and files
with very few exceptions. File types we are able to index include:
pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp,
wri, swf."
-- from 
://www.google.com/webmasters/facts.html

://www.google.com/webmasters/faq.html

Google is very good at being straight forward with you, and has put up
a great deal of content on what they look for and how they act when
they find it.

A final note on this and I'll end this answer. The note is in the
results of the latest SEO Google Ranking Contest .. here's the link.

Single Post Wins Google Contest
http://www.wired.com/news/infostructure/0,1377,64130,00.html?tw=wn_2culthead

I wish you luck, 

thanks

webadept-ga

Clarification of Answer by webadept-ga on 21 Jul 2004 04:38 PDT
Hi, 

Normally it is wise to use the Clarification button, before rating an
answer. but, that's okay.

I'm a bit confused with your listed response however, since he is
saying exactly the same thing I did. Bots don't arrive at pages the
same way, thus the dynamic link this website has building on the last
page visited will be different on each visit.

--"What is happening there with a PREURL code is not the code itself,
but the information it is relating. They are using this as a marker
for
the last page I was on, while going through the site, instead of using
sessions or a cookie or something of that nature. .... The main thing,
the number one thing, which is important to any search engine, not
just Google, is the longevity of the page. The question is "will it
exist tomorrow?" and in pages we see here, the answer is almost
always, "no". So they will not be indexed, not for long anyway." --

and his 

--"The EuN architecture seems to be
in part based on the assumption that visitors are arriving at and
beginning their visits at the default home page. Note that if they do
arrive at the default home page the PREURL variable seems to be
included in each html link on the page (as are the CFID and CFROKEN
variables) except the login link.

But visitors and bots also arrive at pages other than the home page; a
link to EuN from some other site may look simply like another heavily
trafficed page URL (not home the page), like
http://www.execunet.com/e_home.cfm, or like
http://www.execunet.com/r_home.cfm, for example. The query portion of
the URL is not likely to be present unless the link is from a tracked
source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID
and CFTOKEN variables will not be in the requested URLs. The html
links on those pages, called in this manner, sometimes include the
PREURL variable and sometimes do not. " -- 

As for Google's bias, their bias is stated very clearly on the Facts
and Fiction page

--"Fiction:  	Sites are not included in Google's index if they use ASP
(or some other non-html file-type.)

Fact: At Google, we are able to index most types of pages and files
with very few exceptions. File types we are able to index include:
pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp,
wri, swf.
---" 
://www.google.com/webmasters/facts.html

You did notice the SWF there at the end.. yes? 

The problem is not the PREURL in and of itself, it is the displayed
information the PREURL is gathering for the GET string. Both this
other service and myself have said this in different ways. You can
name PREURL  "cash" or "string" or anything you want too, its not
going to matter. The bots see the GET string as the "name of the page"
the whole string. This other service and I have both agreed on this as
well.

his quote --"The query portion of
the URL is not likely to be present unless the link is from a tracked
source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID
and CFTOKEN variables will not be in the requested URLs. The html
links on those pages, called in this manner, sometimes include the
PREURL variable and sometimes do not."--- 

I don't know where he gets the one parameter, two parameter, dance of
logic he has there. It doesn't hold up to observation or for that
matter, what Google says on their pages and/or publications. I didn't
see a reference to his source, so its just opinion as far as I can
tell. Google didn't say it.

He might be thinking of this quote --"If you decide to use dynamic
pages (i.e., the URL contains a '?' character), be aware that not
every search engine spider crawls dynamic pages as well as static
pages. It helps to keep the parameters short and the number of them
small." --

from this page
://www.google.com/webmasters/guidelines.html

But, Google isn't referring to themselves there, they are letting you
know that "other" bots don't crawl them well. So.. (??)_

Be all that as it may, he and I agree completely on the PREURL
problem, and both fo us stated that the page referance in the GET
string need to be taken care of.

So I don't understand your statements in the comment area. I'm fine
with the rating, because I'm assuming that this is your first time
using the service and you didn't know that if you used the
Clarification Button, I would search out and find more information for
you, and would have addressed this other 'advice column' as well. Our
goal is to research your answer. Some times we get it on the first go,
other times we don't. But once we start, we do our best to insure you
have the answer you are after. Next time you use the service, please
keep this in mind. The researcher are very dedicated to their level of
service.

With that said, rating or no/payment or no, if you would like a
greater understanding on this issue or would like to post something
else, that someone else said, for me to research out for you, please
do. You are obviously not quite certain about this issue, so I'm happy
to help you out with it.

webadept-ga

Clarification of Answer by webadept-ga on 21 Jul 2004 05:35 PDT
Also, I'm curious as to what part of your question this other service
and myself have not answered. I, personaly was going on this :

"My question:  Why is the Google index droppping/excluding the PREURL name-value 
pair from the indexed URL's that are being displayed in search
results, and will the manner in which the above referenced URL's are
now indexed/displayed in Google results continue for the foreseeable
future?"

as your stated question. Since it is a two part question, lets go with
simple straight answers here, which both this other service and I have
indeed answerd very clearly.

a) Why is the Google index droppping/excluding the PREURL name-value 
pair from the indexed URL's that are being displayed in search
results 

 answer: The phenomena that is percieved is because the urls for these
pages, as created by the site's internal programming is volitile and
not reproducable on a regular basis, creating a page name that is not
being saved by the search engines. This is not a Google phenomena, but
a search engine phenomea, as these results show in clarity

5 shown on page of 100 Yahoo
http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8&prev_vm=p&va=execunet&va_vt=any&vp=&vp_vt=any&vo=&vo_vt=any&ve=&ve_vt=any&vd=all&vst=on&vs=execunet.com&vf=all&vm=p&vc=&fl=0&n=100

No values shown for MSN
http://search.msn.com/advresults.aspx?q=execunet&FORM=SMCA&adv_f=any&adv_sort=depth+asc&adv_rgn=&adv_lng=&adv_dom=execunet.com&adv_depth=&adv_dt=html&adv_dt=ppt&adv_dt=msword&adv_dt=xl&adv_cf=

All the web (presence too low to get good reading)
http://www.alltheweb.com/search?advanced=1&cat=web&jsact=&_stype=norm&type=all&q=execunet.&_b_query=&l=en&ics=utf-8&cs=utf8&wf%5Bn%5D=3&wf%5B0%5D%5Br%5D=%2B&wf%5B0%5D%5Bq%5D=&wf%5B0%5D%5Bw%5D=&wf%5B1%5D%5Br%5D=%2B&wf%5B1%5D%5Bq%5D=&wf%5B1%5D%5Bw%5D=&wf%5B2%5D%5Br%5D=-&wf%5B2%5D%5Bq%5D=&wf%5B2%5D%5Bw%5D=&dincl=execunet.com&dexcl=&geo=&doctype=&dfr%5Bd%5D=1&dfr%5Bm%5D=1&dfr%5By%5D=1980&dto%5Bd%5D=21
&dto%5Bm%5D=7&dto%5By%5D=2004&hits=100

Lycos 6 values found out of a total presance of 90
http://search.lycos.com/default.asp?query=execunet&first=81&pmore=more&gi=0&npl=dfi%3Dexecunet%252Ecom%26adf%3Doff%26adv%3D1

and Google with same search as  above 1 found out of 100 
://www.google.com/search?q=execunet+site:execunet.com&num=100&hl=en&lr=&ie=UTF-8&as_qdr=all&filter=0

This brief look doesn't show a great deal of descrepancy by any single
search engine with the token in question. The only real descripancy
found was that Google seems to index roughly 1000% more of the site
than the other search engines looked at, and showing 1000 pages as
listings for the query above. The next highest being Yahoo with 100.

No preferance to Google's indexed pages could be seen or noted in the
URL's cached as far as size (many of them extendng out to 200 and 300
characters) or number of tokens in the string (a large percentage
having 4 - 6 tokens creating the URL string).

As far as I can see with the base research conducted, the results
shown by the search engines are in line andn predictable, with each
search engine's stated/published  abilities and documentation which
can be found on each search engines main website.



b) and will the manner in which the above referenced URL's are
now indexed/displayed in Google results continue for the foreseeable
future?"

answer : yes.

Request for Answer Clarification by chametro-ga on 22 Jul 2004 09:26 PDT
Thanks ... it is time to move on.  No need to reply. 

FYI these are the Google references he provided. They do seem to
directly support his analysis.

    Design and Content Guidelines:
    ·  If you decide to use dynamic pages (i.e., the URL contains a '?' 
       character), be aware that not every search engine spider crawls 
       dynamic pages as well as static pages. It helps to keep the parameters 
       short and the number of them small.

    1. Reasons your site may not be included.
    ·  Your pages are dynamically generated. We are able to index dynamically 
       generated pages. However, because our web crawler can easily overwhelm 
       and crash sites serving dynamic content, we limit the amount of dynamic 
       pages we index.

If the number of dynamic pages indexed are limited (Google says they
are) and there is a bias toward fewer and shorter parameters (Google
!strongly implies! there is such a bias), AND the googlebots can find
links at the EuN site that have fewer and shorter parameters it makes
sense that the Google index contains more of those links. It is a
little surprising that the ratio of indexed URLs containing the PREURL
parameter is roughly 1:3000, but the AllTheWeb ratio is 235:10200 so
the "general" explanation seems credible.

Cheers
chametro-ga rated this answer:1 out of 5 stars
Though portions of the response are well written, and address an
element of the question, the question was not answered. Granted, the
question is difficult.

Having read other response provided in this Google Q&A service, I
tried to head off general commentary by saying the client and I are
SEO aware. I stated they had received SEO advice and that they were
implemeting an architectural change. To assume you are communicating
with SEO novices is a bit presumptious.

Here's a portion of an answer I received thru another venue. It
doesn't answer the question with certainty --but it does offer a very
probable explanation based on solid and irrefutible analysis. It was
provided with a lot of abreviations and shorthand wording, so I
rewrote it for here. Any errors are mine.
===========

The EuN FQDN is:  www.execunet.com/  The EuN architecture seems to be
in part based on the assumption that visitors are arriving at and
beginning their visits at the default home page. Note that if they do
arrive at the default home page the PREURL variable seems to be
included in each html link on the page (as are the CFID and CFROKEN
variables) except the login link.

But visitors and bots also arrive at pages other than the home page; a
link to EuN from some other site may look simply like another heavily
trafficed page URL (not home the page), like
http://www.execunet.com/e_home.cfm, or like
http://www.execunet.com/r_home.cfm, for example. The query portion of
the URL is not likely to be present unless the link is from a tracked
source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID
and CFTOKEN variables will not be in the requested URLs. The html
links on those pages, called in this manner, sometimes include the
PREURL variable and sometimes do not.

Now that we know that PREURL is sometimes not in the called url let's
take an educated guess at the effect on Google's spidering and
indexing of the EuN site. Google's stated bias against dynamic urls
means that they (say they) index more pages with a parameter than
those urls with two parameters; more pages with two than three, etc.
Where PREURL is found it is frequesntly the third or fourth parameter;
when it isn't included, the url often has only two parameters. If you
take Google at their word, this is part of the reason you are seeing
one (or a few) PREURL urls in the index, even though thousands of
instances are in the index.

My advice, the sooner your client fixes this, the better!
=========

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy