Google Answers Logo
View Question
 
Q: detecting cloaked web pages ( Answered 4 out of 5 stars,   2 Comments )
Question  
Subject: detecting cloaked web pages
Category: Computers > Internet
Asked by: tobes-ga
List Price: $20.00
Posted: 05 Dec 2002 10:16 PST
Expires: 04 Jan 2003 10:16 PST
Question ID: 119797
I suspect that some pages I'm competing against for search engine
rankings are cloaked. Other than comparing the source code with the
search engine listing (which may match even if cloaked), is there a
tool or method I can use to more accurately detect a cloaked page? I'm
not looking to cause trouble for anyone. I just want to know what I'm
up against.  "Yes" answers only please.

Request for Question Clarification by lot-ga on 05 Dec 2002 14:40 PST
Hello tobes-ga

I didn't find a way to accurately detect a cloaked page, (as the major
search engines have trouble) but any pages you suspect may be cloaked,
 I can tell some methods to reveal cloaked pages in a deliberately
vague way, as the methods of doing so may be illegal and against
policies. So you will have an idea and not a step by step guide of how
to actually do it.  A bit limiting I know, but if you will find this
useful let me know,
kind regards
lot-ga
Answer  
Subject: Re: detecting cloaked web pages
Answered By: webadept-ga on 05 Dec 2002 23:50 PST
Rated:4 out of 5 stars
 
Hi, 

1. What is cloaking? 

The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site. In other words, the
webserver is programmed to return different content to Google than it
returns to regular users, usually in an attempt to distort search
engine rankings. This can mislead users about what they'll find when
they click on a search result. To preserve the accuracy and quality of
our search results, Google may permanently ban from our index any
sites or site authors that engage in cloaking to distort their search
rankings.

That is what Google calls cloaking. The way to detect this is to
search the page twice, once making the server think the Googlebot is
looking at it, and the second time by telling it a Webbrowser like
Netscape is looking at it, this can be done with a Perl program.

Since there doesn't appear to be a program out there for public use to
do this, I decided to make one since it would only take an hour or so
to accomplish that and it would be rather cool to have. You can go to:

 http://www.webadept.net/cloaker/index.html 

and use the program there to check websites. 

Thanks, 

webadept-ga

Request for Answer Clarification by tobes-ga on 06 Dec 2002 01:43 PST
Thanks again. For clarification, how accurate do you think the tool
is? I checked the top five Goggle un-sponsored listings for the highly
competitive keywords “data recovery" and (had to do it), "search
engine optimization."  The tool found only one cloaked page among them
(for comparison, the overture PPC bid for  #1 listings on these
keywords is 9-10 bucks a pop!). If these results are correct, it would
seem that page cloaking is of highly dubious value. So, should I take
the cloaking check results with a grain of salt or renew my faith in
humanity?
Best,
tobes

Clarification of Answer by webadept-ga on 06 Dec 2002 02:36 PST
The tool is pretty accurate. I played with it for a while. First off
it shows the webserver that the first request is coming from a IE 6.0
browser, the second request is coming from Googlebot/2.1 and the logs
I checked on my servers and a few clients look identical to the real
Googlebot as near as I can tell. I doubt that anyone would be able to
tell the difference programmicaly. But, everyday someone is doing
something someone else said was impossible. :-)

Anyway, as far as cloaking goes, it is a lot of work, very high risk,
and difficult to maintain. The simple check I created there, simple
meaning rather fast to create, not in technology, is really easy to
add too, and make even more devious. Search engines don't do it a lot
because it is processor intensive and they need to keep their servers
running as fast as they can. But I am sure that they do run checks
periodically on various sites. Spot checks if you will. There are
times when I'm asked "Why did my site suddenly vanish after being on
top for the last year" and I find out that they were using a cloaked
page. That answers the question. Also it brings up the huge risk.

Once you are out of the index, it takes a very long time to get Google
to put you back, and when they do your pagerank is down, far down.
They don't like it, and they say so rather bluntly. So if you are a
serious company on the Internet with a mind for a future, why risk it?
PageRank and Page Relevance are available to anyone willing to put in
the time and effort. There are no guarantees, but there is a guarantee
that if you do cloak, eventually the bots are going to figure it out
and your company is suddenly going to disappear.

Faith in humanity? Maybe, maybe not. Faith in personal survival..
yes.. definitely. By the way, the one you found you can report to
Google using this page here.. :

://www.google.com/contact/spamreport.html

Thanks 

webadept-ga

Clarification of Answer by webadept-ga on 06 Dec 2002 02:47 PST
By the way, if you have a few sets of searches you would like checked
for the first 100 in the Google list for cloaking, just post a
question with those key phrases and I'll run the program against the
checks. It would take some alteration to the code, but certainly not
much, and I can grab the first 100. Just post the keyword phrases you
would like and I'll set it up and post you the report.

Thanks, 

webadept-ga

Request for Answer Clarification by tobes-ga on 06 Dec 2002 09:18 PST
Thanks  for the offer but I think I can check what I need with a few
manual entries. I have serious doubts that the companies I need to
check have deep enough pockets to buy cloaking that can get around
your checker, so I think your method will do for my purposes.
Thanks Again.

Clarification of Answer by webadept-ga on 06 Dec 2002 11:09 PST
Hi again, Thanks for the response. 

As Hailstorm and Lot have pointed out, there are different types of
cloaking, which this program doesn't look at. However, there are
acceptable reasons to cloak a page using these methods. For instance,
the IP address is known to come from France, so your page is sent out
in the French language, rather than English. Or, the IP address was
recorded for a registered user, so his preferences are given rather
than the basic page. A page is created for IE, Netscape and "others"
and the cloaking program sends the pages in the format best seen by
those browsers. These are all reasonable means and methods of cloaking
which are used by websites. Heck, I use them.

What Google is miffed about is cloaking to the googlebot itself, and
this is much more difficult to cloak. First off IP cloaking doesn't
work, because, well, They are Google. As Lot pointed out as well, the
IP can be spoofed. I can set up my server to have a spoofed Google
address and run the program through that using code to send through
that "spoofed" port and wala! I'm the googlebot. I'm not going to do
that with this program, as it's not really required to do so.

Cloaking to the bots is much harder, especially if the bots are
checking. It is really easy for them to jump an IP address, become a
IE browser, get the page and check again as the normal Bot. Just about
any programmer at my level can do it too. So most of the "hype" on the
page Hailstorm has given about the "greatness" of their service is
just that.. hype.

So, there is nothing wrong with cloaking a page for user viewing, and
the service Hailstorm has given looks really good for that, and like I
said, I do it myself. It's very useful for the users and helps keep
your site fresh and alive. But cloaking to the bots is not easy to get
away with and has huge repercussions. Also, as you say, most companies
aren't going to spend the money on IP cloaking for Bots. They may do
it for Users and IP blocks, but not for the bots. Anyone that runs a
webserver for any length of time knows that the IP addresses change
for the bots at random times. So a company could get unlisted simply
because they didn't respond to the bot when it showed up, if it wasn't
done right.

Thanks, 

webadept-ga
tobes-ga rated this answer:4 out of 5 stars and gave an additional tip of: $20.00

Comments  
Subject: Re: detecting cloaked web pages
From: hailstorm-ga on 06 Dec 2002 04:39 PST
 
According to this site:

http://www.searchengineworld.com/misc/cloaking_agents.htm

There are five different types of cloaking: 
  1) User Agent Cloaking (UA Cloaking)
  2) IP Agent Cloaking (IP Cloaking)
  3) IP and User Agent Cloaking (IPUA Cloaking). 
  4) Referral based cloaking. 
  5) Session based cloaking

Some of these, especially IP cloaking, is extremely advanced.
Fantomaster site http://fantomaster.com sells a database that updates
all the search engine robot IP addresses several times a day.

So, I am sorry to say that I don't think webadept's tool can work on
advanced cloaking of this nature.
Subject: Re: detecting cloaked web pages
From: lot-ga on 06 Dec 2002 10:13 PST
 
Hello,
I used to use IP cloaking (before it was frowned on by the search
engines) and the script was free. This is free too
http://scriptsmatrix.com/Detailed/558.html - but has since been
removed.
This is also an inexpensive IP cloaking service
http://www.improved-ranking.com. However it is important to point out
I am not recommending these services but simply highlighting their
existance and the ease that unscrupulous webmasters can implement IP
cloaking. Conversely at the other end of the spectrum, in order to see
these cloaked pages you would need to perform IP spoofing, to
impersonate an identity (one of the search engines) by assuming their
IP address and user agent, which is legally and ethically questionable
in itself.
regards
lot-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy