Google Answers: Search result displayed URL's oddity

View Question

Q: Search result displayed URL's oddity ( Answered 1 out of 5 stars

, 0 Comments )

Question

Subject: Search result displayed URL's oddity
Category: Computers > Internet
Asked by: chametro-ga
List Price: $50.00

Posted: 15 Jul 2004 09:55 PDT
Expires: 14 Aug 2004 09:55 PDT
Question ID: 374519

I'm working with a firm where one of the project objectives is to remove 
session ID's from their web page URL's. Their public page URL's now have 
certain name-value pairs appended to them as a query, for example:

   PREURL=e_home&CFID=957450&CFTOKEN=74716046

They are altering the architecture and are removing CFID and CFTOKEN from the 
URL's of public web pages, however the PREURL name-value pair will remain. 
They've asked if this will provide a major boost in their page rankings,
or if the PREURL name-value pair will blunt all improvement (PREURL
::= name previous web page visited). They've been including the PREURL
variable for about one year and it is tied into their processes. They
(and I) are "SEO aware", and do understand the basics of Google page
ranking.

I've tentatively told them that removing the session data should help
considerably, but I'm unsure what the effect of leaving the PREURL
value will
be. 

Researching their Google-indexed URL's I came across the extraordinary fact 
that ONLY one of their URL's containing the PREURL variable is in the
index (about 3010 "pages" that contain CFID & CFTOKEN are indexed,
however).

--- research summary start -------------------------------------------------
Google Search Results:

google search field entry >> inurl:PREURL site:www.execunet.com
returns 1 result
the results page address >>
www.execunet.com/e_network.cfm?PREURL=e_home&CFID=957450&CFTOKEN=74716046

google search field entry >> inurl:CFID site:www.execunet.com
returns >> ~3010 results
example results page address >>
www.execunet.com/m_home.cfm?CFID=1321599&
CFTOKEN=7ca2194bd1876c8a-6AEC24C3-A37C-9F38-44D4E7BEC594D88B
--- research summary end  --------------------------------------------------

I've looked at the cached pages in the Google index. The a-href links in the 
pages' html do include the PREURL variable - however the displayed Google-
indexed pages' URL's do not (with the exception of the one page referenced 
above).

The firm prefers it not be present in the displayed search results,
but I'm guessing that the "disappearing PREURL" effect is unintended &
serendipitous.

My question:  Why is the Google index droppping/excluding the PREURL name-value 
pair from the indexed URL's that are being displayed in search
results, and will the manner in which the above referenced URL's are
now indexed/displayed in Google results continue for the foreseeable
future?

Request for Question Clarification by webadept-ga on 15 Jul 2004 17:29 PDT

Hi, I just want to clarify something with you here. You are "cleaning"
Dynamic links and you believe that one type of dynamic link is better
than another? Is that what we are talking here?

Check this out.. go out there and search for 
inurl:chm site:www.execunet.com 

Thats just going to drive you nuts. better than giving someone a
chineese finger puzzle.

webadept-ga

Clarification of Question by chametro-ga on 15 Jul 2004 18:24 PDT

Re: the first request for clarification...

    ...I just want to clarify something with you here. You are "cleaning"
    Dynamic links and you believe that one type of dynamic link is better
    than another? Is that what we are talking here?

In the broadest sense, yes. I cannot go into a lot of detail, but "nirvana" 
would be to remove the query portion of the URL's entirely (per the
recommendations of several paid SEO consultants). That cannot be done
at
this time.

However, everything but the PREURL name-value pair can be removed from the 
public pages. This will dramatically cut down the number of "pages" indexed 
and will increase the "popularity" measure of the second, third, forth... 
levels of the public pages.

Firm was all set to go when the weird results that I provided in the question's 
preamble appeared in a test search. So I've posted the question in hopes that 
someone knows the inner workings of the Google spiders/indexing methods and 
can explain why the PREURL name-vaule pair is missing from all but one of 
the 3000 plus indexed "page" URL's. 

    Check this out.. go out there and search for 
    inurl:chm site:www.execunet.com 

I tried the above at Google (& similar search at AllTheWeb) and got zero 
results at both engines. Typo?

Answer

Subject: Re: Search result displayed URL's oddity
Answered By: webadept-ga on 18 Jul 2004 13:10 PDT
Rated: 1 out of 5 stars

Hi, It took me several read throughs to get this, but I finaly did. So, I'm goig to answer this for you, as clearly as I can, taking much of the tech fluff out of it, and I'll explain why in a bit. Basically, you are right, but the way you said you were right had me thinking you were dead wrong. So, clear your mind of what you think you know for a moment, and read this through, if you have other questoins regarding this, please feel free to use the Clarification button and I'll see that you get your answers. But, lets' start here first. First off, it is popular now in the SEO arena to suggest that dynamic links of all kinds are not indexed by Google, when in fact, Google's own SEO page, says they do this just fine, and if you look at the address of the pages Google uses, they are just as dynamic as everyone else's. Secondly, if someone were to suggest to me that PREURL is not indexing as well as CFID (or visa versa) my first thought would be that the PREURL pages are not as static as the CFID pages. Meaning, that when the Google bot returns, it cant' find the same pages it did the last time it was here.. and really, that's all it cares about. If it found a page that it thought noteworthy enough to index, it wants to find that page on its next trip out to your site. If it doesn't find these pages, then they drop out of the index, as not being stable enough to keep in the data files. Third, the utility you are using to test this is just that, a utility, a game, a thing to get an idea with, and shouldn't be confused with a diagnostic tool. Cause its not and was never intended to be a diagnostic tool for your SEO program. There are times (quite a few really) that the site: inurl: and linked: do reflect what the main engine says, there are quite a few other times they do not. Much of this has to do with the massive updating required to keep those tools, and the main engine in sync. Much of it also has to do with the Google Dance, and the monthly updates to the main engines, which span over weeks in time as the main engines across the globe sync with each other. A great deal more has to do with the well known fact that the only ones really using the site: search are people searching their own sites. So, its not as important to keep accurate as the main engine is. I went over their website. They have a very good ratio of indexed pages, and no doubt this has come from many long hours of detailed attention. They also have a great deal of content there, which is the biggest variable to keep working. There are several factors involved with the Google engine (and not just them, all of the engines have their quirks these days), But, really.. consider what you are suggesting here... It would take 'extra' code in the Google bot to have the preference you are suggesting. What is happening there with a PREURL code is not the code itself, but the information it is relating. They are using this as a marker for the last page I was on, while going through the site, instead of using sessions or a cookie or something of that nature. If they remove this code, but continue to put this information (last page URL) in the string, they will have the same results, because these pages won't be there on the next time the bot comes through. If they remove the other codes and keep this one, ... the net result will be 0 as well. .. No, that's not really true, They will have still changed every url on every page on their site, so they will probably loose a great deal of indexed pages for a while. If they are moving to a schema that results in more stable URL's then it is worth it, if they are not, then, they should probably reconsider their logic. If they change these to static types using MOD_Rewrite, they will still have the same problem. The main thing, the number one thing, which is important to any search engine, not just Google, is the longevity of the page. The question is "will it exist tomorrow?" and in pages we see here, the answer is almost always, "no". So they will not be indexed, not for long anyway. Consistency and content are the two main factors in Google indexing. There's not much mystery involved. You build a good site, with good content and keep it up and consistent, then you rank high. If you change URL's constantly, have 100's of URL's pointing to the same URL, (which happens by default with their type of setup, because the bot comes in different routes, and finds later that it has several different URL's pointing to the exact same content ) then.. you don't. It doesn't take highly paid SEO's to tell you that. Think of it this way, you have two friends who give you great advice. One, has a single number, and every time you call it, he's there to answer the question. The other, well.. he's not always at the same number, and sometimes, he's not anywhere. Over the course of a couple of months, who are you most likely to call on a regular basis? Now, as for your question, the answer is no, and I've explained it rather well I think, but not for the reasons you have started out with. Any change to the URL's will affect them in the short run. Expect them to drop quite a bit with a site wide change like that. Keeping the PREURL tag in there, is .. well, ludicrous really, (why change at all?) Having dynamically created links is not a problem. What is a problem is creating links that will not be created again, or .. inconsistently created in the future. There are better, more reliable, and much more accurate methods of tracking user progress through a site. So, yes, your are right and being very consensus in pointing this out to them. I've explained this in very simple term in this answer, because, although you were on the right track, it took me several read through to realize what it was your were getting at, and where you were coming from. So, this simple way of explaining it is also to give you a method, of relating your ideas in a context that your clients will understand as well. You and they need to realize that in the area you are addressing, 'deep inner workings of the Google engine' are not necessary to see and understand basic facts of life. Google, and every other engine, has a limited amount of space and has to keep the engine as clean as possible to get results from that base as fast as possible. That is reality. There is no getting around that. Second, they (Search Engines) want as many different results for a give query, which relates to that query, as possible. Finding that a search (any search) shows 100's of links to the exact same content, is frustrating to the user and embarrassing for the SE. So, when they discover sites which create this phenomena, they remove them, or filter them down heavily. Again, no secret here, just basic business. Third, a simple javascript cookie placed into the body of the page, would solve this. In fact they are missing the simple basics. Like a site map. http://www.execunet.com/sitemap.html Google will use that page to re-index your site. It doesn't care that those links in that file are dynamic. All it cares about is that the page is there when it comes back next week. It also cares about the content an that there is something meaningful there, but that's another topic, and one we aren't ready to address at this point. The don't' have a robots.txt file, to help the bot know where to go and what to do when it is there. It might be in your HEADER tag, but that's not where its going to look for it for 'site constancy'. http://www.execunet.com/robots.txt All of this is on the Google main Page, it's not hidden information, and doesn't require a highly paid SEO to gather it up for them. (Or maybe it does. It seems that more and more businesses would rather pay than play these days). ://www.google.com/webmasters/ ://www.google.com/webmasters/4.html ://www.google.com/webmasters/3.html Quoate--"Fiction: Sites are not included in Google's index if they use ASP (or some other non-html file-type.) Fact: At Google, we are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri, swf." -- from ://www.google.com/webmasters/facts.html ://www.google.com/webmasters/faq.html Google is very good at being straight forward with you, and has put up a great deal of content on what they look for and how they act when they find it. A final note on this and I'll end this answer. The note is in the results of the latest SEO Google Ranking Contest .. here's the link. Single Post Wins Google Contest http://www.wired.com/news/infostructure/0,1377,64130,00.html?tw=wn_2culthead I wish you luck, thanks webadept-ga
Clarification of Answer by webadept-ga on 21 Jul 2004 04:38 PDT Hi, Normally it is wise to use the Clarification button, before rating an answer. but, that's okay. I'm a bit confused with your listed response however, since he is saying exactly the same thing I did. Bots don't arrive at pages the same way, thus the dynamic link this website has building on the last page visited will be different on each visit. --"What is happening there with a PREURL code is not the code itself, but the information it is relating. They are using this as a marker for the last page I was on, while going through the site, instead of using sessions or a cookie or something of that nature. .... The main thing, the number one thing, which is important to any search engine, not just Google, is the longevity of the page. The question is "will it exist tomorrow?" and in pages we see here, the answer is almost always, "no". So they will not be indexed, not for long anyway." -- and his --"The EuN architecture seems to be in part based on the assumption that visitors are arriving at and beginning their visits at the default home page. Note that if they do arrive at the default home page the PREURL variable seems to be included in each html link on the page (as are the CFID and CFROKEN variables) except the login link. But visitors and bots also arrive at pages other than the home page; a link to EuN from some other site may look simply like another heavily trafficed page URL (not home the page), like http://www.execunet.com/e_home.cfm, or like http://www.execunet.com/r_home.cfm, for example. The query portion of the URL is not likely to be present unless the link is from a tracked source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID and CFTOKEN variables will not be in the requested URLs. The html links on those pages, called in this manner, sometimes include the PREURL variable and sometimes do not. " -- As for Google's bias, their bias is stated very clearly on the Facts and Fiction page --"Fiction: Sites are not included in Google's index if they use ASP (or some other non-html file-type.) Fact: At Google, we are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, hdml, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri, swf. ---" ://www.google.com/webmasters/facts.html You did notice the SWF there at the end.. yes? The problem is not the PREURL in and of itself, it is the displayed information the PREURL is gathering for the GET string. Both this other service and myself have said this in different ways. You can name PREURL "cash" or "string" or anything you want too, its not going to matter. The bots see the GET string as the "name of the page" the whole string. This other service and I have both agreed on this as well. his quote --"The query portion of the URL is not likely to be present unless the link is from a tracked source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID and CFTOKEN variables will not be in the requested URLs. The html links on those pages, called in this manner, sometimes include the PREURL variable and sometimes do not."--- I don't know where he gets the one parameter, two parameter, dance of logic he has there. It doesn't hold up to observation or for that matter, what Google says on their pages and/or publications. I didn't see a reference to his source, so its just opinion as far as I can tell. Google didn't say it. He might be thinking of this quote --"If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small." -- from this page ://www.google.com/webmasters/guidelines.html But, Google isn't referring to themselves there, they are letting you know that "other" bots don't crawl them well. So.. (??)_ Be all that as it may, he and I agree completely on the PREURL problem, and both fo us stated that the page referance in the GET string need to be taken care of. So I don't understand your statements in the comment area. I'm fine with the rating, because I'm assuming that this is your first time using the service and you didn't know that if you used the Clarification Button, I would search out and find more information for you, and would have addressed this other 'advice column' as well. Our goal is to research your answer. Some times we get it on the first go, other times we don't. But once we start, we do our best to insure you have the answer you are after. Next time you use the service, please keep this in mind. The researcher are very dedicated to their level of service. With that said, rating or no/payment or no, if you would like a greater understanding on this issue or would like to post something else, that someone else said, for me to research out for you, please do. You are obviously not quite certain about this issue, so I'm happy to help you out with it. webadept-ga
Clarification of Answer by webadept-ga on 21 Jul 2004 05:35 PDT Also, I'm curious as to what part of your question this other service and myself have not answered. I, personaly was going on this : "My question: Why is the Google index droppping/excluding the PREURL name-value pair from the indexed URL's that are being displayed in search results, and will the manner in which the above referenced URL's are now indexed/displayed in Google results continue for the foreseeable future?" as your stated question. Since it is a two part question, lets go with simple straight answers here, which both this other service and I have indeed answerd very clearly. a) Why is the Google index droppping/excluding the PREURL name-value pair from the indexed URL's that are being displayed in search results answer: The phenomena that is percieved is because the urls for these pages, as created by the site's internal programming is volitile and not reproducable on a regular basis, creating a page name that is not being saved by the search engines. This is not a Google phenomena, but a search engine phenomea, as these results show in clarity 5 shown on page of 100 Yahoo http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8&prev_vm=p&va=execunet&va_vt=any&vp=&vp_vt=any&vo=&vo_vt=any&ve=&ve_vt=any&vd=all&vst=on&vs=execunet.com&vf=all&vm=p&vc=&fl=0&n=100 No values shown for MSN http://search.msn.com/advresults.aspx?q=execunet&FORM=SMCA&adv_f=any&adv_sort=depth+asc&adv_rgn=&adv_lng=&adv_dom=execunet.com&adv_depth=&adv_dt=html&adv_dt=ppt&adv_dt=msword&adv_dt=xl&adv_cf= All the web (presence too low to get good reading) http://www.alltheweb.com/search?advanced=1&cat=web&jsact=&_stype=norm&type=all&q=execunet.&_b_query=&l=en&ics=utf-8&cs=utf8&wf%5Bn%5D=3&wf%5B0%5D%5Br%5D=%2B&wf%5B0%5D%5Bq%5D=&wf%5B0%5D%5Bw%5D=&wf%5B1%5D%5Br%5D=%2B&wf%5B1%5D%5Bq%5D=&wf%5B1%5D%5Bw%5D=&wf%5B2%5D%5Br%5D=-&wf%5B2%5D%5Bq%5D=&wf%5B2%5D%5Bw%5D=&dincl=execunet.com&dexcl=&geo=&doctype=&dfr%5Bd%5D=1&dfr%5Bm%5D=1&dfr%5By%5D=1980&dto%5Bd%5D=21 &dto%5Bm%5D=7&dto%5By%5D=2004&hits=100 Lycos 6 values found out of a total presance of 90 http://search.lycos.com/default.asp?query=execunet&first=81&pmore=more&gi=0&npl=dfi%3Dexecunet%252Ecom%26adf%3Doff%26adv%3D1 and Google with same search as above 1 found out of 100 ://www.google.com/search?q=execunet+site:execunet.com&num=100&hl=en&lr=&ie=UTF-8&as_qdr=all&filter=0 This brief look doesn't show a great deal of descrepancy by any single search engine with the token in question. The only real descripancy found was that Google seems to index roughly 1000% more of the site than the other search engines looked at, and showing 1000 pages as listings for the query above. The next highest being Yahoo with 100. No preferance to Google's indexed pages could be seen or noted in the URL's cached as far as size (many of them extendng out to 200 and 300 characters) or number of tokens in the string (a large percentage having 4 - 6 tokens creating the URL string). As far as I can see with the base research conducted, the results shown by the search engines are in line andn predictable, with each search engine's stated/published abilities and documentation which can be found on each search engines main website. b) and will the manner in which the above referenced URL's are now indexed/displayed in Google results continue for the foreseeable future?" answer : yes.
Request for Answer Clarification by chametro-ga on 22 Jul 2004 09:26 PDT Thanks ... it is time to move on. No need to reply. FYI these are the Google references he provided. They do seem to directly support his analysis. Design and Content Guidelines: � If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small. 1. Reasons your site may not be included. � Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index. If the number of dynamic pages indexed are limited (Google says they are) and there is a bias toward fewer and shorter parameters (Google !strongly implies! there is such a bias), AND the googlebots can find links at the EuN site that have fewer and shorter parameters it makes sense that the Google index contains more of those links. It is a little surprising that the ratio of indexed URLs containing the PREURL parameter is roughly 1:3000, but the AllTheWeb ratio is 235:10200 so the "general" explanation seems credible. Cheers

chametro-ga rated this answer: 1 out of 5 stars

Though portions of the response are well written, and address an
element of the question, the question was not answered. Granted, the
question is difficult.

Having read other response provided in this Google Q&A service, I
tried to head off general commentary by saying the client and I are
SEO aware. I stated they had received SEO advice and that they were
implemeting an architectural change. To assume you are communicating
with SEO novices is a bit presumptious.

Here's a portion of an answer I received thru another venue. It
doesn't answer the question with certainty --but it does offer a very
probable explanation based on solid and irrefutible analysis. It was
provided with a lot of abreviations and shorthand wording, so I
rewrote it for here. Any errors are mine.
===========

The EuN FQDN is:  www.execunet.com/  The EuN architecture seems to be
in part based on the assumption that visitors are arriving at and
beginning their visits at the default home page. Note that if they do
arrive at the default home page the PREURL variable seems to be
included in each html link on the page (as are the CFID and CFROKEN
variables) except the login link.

But visitors and bots also arrive at pages other than the home page; a
link to EuN from some other site may look simply like another heavily
trafficed page URL (not home the page), like
http://www.execunet.com/e_home.cfm, or like
http://www.execunet.com/r_home.cfm, for example. The query portion of
the URL is not likely to be present unless the link is from a tracked
source (e.g. ?welcome=xxxxxxxxx), but in any event, the PREURL, CFID
and CFTOKEN variables will not be in the requested URLs. The html
links on those pages, called in this manner, sometimes include the
PREURL variable and sometimes do not.

Now that we know that PREURL is sometimes not in the called url let's
take an educated guess at the effect on Google's spidering and
indexing of the EuN site. Google's stated bias against dynamic urls
means that they (say they) index more pages with a parameter than
those urls with two parameters; more pages with two than three, etc.
Where PREURL is found it is frequesntly the third or fourth parameter;
when it isn't included, the url often has only two parameters. If you
take Google at their word, this is part of the reason you are seeing
one (or a few) PREURL urls in the index, even though thousands of
instances are in the index.

My advice, the sooner your client fixes this, the better!
=========

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy