Hi,
1. What is cloaking?
The term "cloaking" is used to describe a website that returns altered
webpages to search engines crawling the site. In other words, the
webserver is programmed to return different content to Google than it
returns to regular users, usually in an attempt to distort search
engine rankings. This can mislead users about what they'll find when
they click on a search result. To preserve the accuracy and quality of
our search results, Google may permanently ban from our index any
sites or site authors that engage in cloaking to distort their search
rankings.
That is what Google calls cloaking. The way to detect this is to
search the page twice, once making the server think the Googlebot is
looking at it, and the second time by telling it a Webbrowser like
Netscape is looking at it, this can be done with a Perl program.
Since there doesn't appear to be a program out there for public use to
do this, I decided to make one since it would only take an hour or so
to accomplish that and it would be rather cool to have. You can go to:
http://www.webadept.net/cloaker/index.html
and use the program there to check websites.
Thanks,
webadept-ga |
Request for Answer Clarification by
tobes-ga
on
06 Dec 2002 01:43 PST
Thanks again. For clarification, how accurate do you think the tool
is? I checked the top five Goggle un-sponsored listings for the highly
competitive keywords data recovery" and (had to do it), "search
engine optimization." The tool found only one cloaked page among them
(for comparison, the overture PPC bid for #1 listings on these
keywords is 9-10 bucks a pop!). If these results are correct, it would
seem that page cloaking is of highly dubious value. So, should I take
the cloaking check results with a grain of salt or renew my faith in
humanity?
Best,
tobes
|
Clarification of Answer by
webadept-ga
on
06 Dec 2002 02:36 PST
The tool is pretty accurate. I played with it for a while. First off
it shows the webserver that the first request is coming from a IE 6.0
browser, the second request is coming from Googlebot/2.1 and the logs
I checked on my servers and a few clients look identical to the real
Googlebot as near as I can tell. I doubt that anyone would be able to
tell the difference programmicaly. But, everyday someone is doing
something someone else said was impossible. :-)
Anyway, as far as cloaking goes, it is a lot of work, very high risk,
and difficult to maintain. The simple check I created there, simple
meaning rather fast to create, not in technology, is really easy to
add too, and make even more devious. Search engines don't do it a lot
because it is processor intensive and they need to keep their servers
running as fast as they can. But I am sure that they do run checks
periodically on various sites. Spot checks if you will. There are
times when I'm asked "Why did my site suddenly vanish after being on
top for the last year" and I find out that they were using a cloaked
page. That answers the question. Also it brings up the huge risk.
Once you are out of the index, it takes a very long time to get Google
to put you back, and when they do your pagerank is down, far down.
They don't like it, and they say so rather bluntly. So if you are a
serious company on the Internet with a mind for a future, why risk it?
PageRank and Page Relevance are available to anyone willing to put in
the time and effort. There are no guarantees, but there is a guarantee
that if you do cloak, eventually the bots are going to figure it out
and your company is suddenly going to disappear.
Faith in humanity? Maybe, maybe not. Faith in personal survival..
yes.. definitely. By the way, the one you found you can report to
Google using this page here.. :
://www.google.com/contact/spamreport.html
Thanks
webadept-ga
|
Clarification of Answer by
webadept-ga
on
06 Dec 2002 02:47 PST
By the way, if you have a few sets of searches you would like checked
for the first 100 in the Google list for cloaking, just post a
question with those key phrases and I'll run the program against the
checks. It would take some alteration to the code, but certainly not
much, and I can grab the first 100. Just post the keyword phrases you
would like and I'll set it up and post you the report.
Thanks,
webadept-ga
|
Request for Answer Clarification by
tobes-ga
on
06 Dec 2002 09:18 PST
Thanks for the offer but I think I can check what I need with a few
manual entries. I have serious doubts that the companies I need to
check have deep enough pockets to buy cloaking that can get around
your checker, so I think your method will do for my purposes.
Thanks Again.
|
Clarification of Answer by
webadept-ga
on
06 Dec 2002 11:09 PST
Hi again, Thanks for the response.
As Hailstorm and Lot have pointed out, there are different types of
cloaking, which this program doesn't look at. However, there are
acceptable reasons to cloak a page using these methods. For instance,
the IP address is known to come from France, so your page is sent out
in the French language, rather than English. Or, the IP address was
recorded for a registered user, so his preferences are given rather
than the basic page. A page is created for IE, Netscape and "others"
and the cloaking program sends the pages in the format best seen by
those browsers. These are all reasonable means and methods of cloaking
which are used by websites. Heck, I use them.
What Google is miffed about is cloaking to the googlebot itself, and
this is much more difficult to cloak. First off IP cloaking doesn't
work, because, well, They are Google. As Lot pointed out as well, the
IP can be spoofed. I can set up my server to have a spoofed Google
address and run the program through that using code to send through
that "spoofed" port and wala! I'm the googlebot. I'm not going to do
that with this program, as it's not really required to do so.
Cloaking to the bots is much harder, especially if the bots are
checking. It is really easy for them to jump an IP address, become a
IE browser, get the page and check again as the normal Bot. Just about
any programmer at my level can do it too. So most of the "hype" on the
page Hailstorm has given about the "greatness" of their service is
just that.. hype.
So, there is nothing wrong with cloaking a page for user viewing, and
the service Hailstorm has given looks really good for that, and like I
said, I do it myself. It's very useful for the users and helps keep
your site fresh and alive. But cloaking to the bots is not easy to get
away with and has huge repercussions. Also, as you say, most companies
aren't going to spend the money on IP cloaking for Bots. They may do
it for Users and IP blocks, but not for the bots. Anyone that runs a
webserver for any length of time knows that the IP addresses change
for the bots at random times. So a company could get unlisted simply
because they didn't respond to the bot when it showed up, if it wasn't
done right.
Thanks,
webadept-ga
|