Google Answers Logo
View Question
 
Q: Apache's handling of spaces in GET request ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Apache's handling of spaces in GET request
Category: Computers > Programming
Asked by: whiteout-ga
List Price: $75.00
Posted: 30 Jun 2004 17:06 PDT
Expires: 30 Jul 2004 17:06 PDT
Question ID: 368359
Hi,

I have developed a PHP script. This script is intended to be accessed
through the AOL Instant Messenger (AIM) client, rather than a typical
web browser like Internet Explorer. A link to the .php file is placed
in the AIM user's profile. The syntax of the URL query is
http://www.domain.com/view.php?id=16&nick=%n
The variable "id" is an identification number that is not relevant to
this question. The variable "nick" holds %n, which in AOL Instant
Messenger is replaced with the visitor's AIM Screenname. Therefore,
when a visitor with Screenname "big john" views this AIM user's
profile, the visitor will see a link with the following URL (notice
how "%n" is replaced with "big john"):

http://www.domain.com/view.php?id=16&nick=big john

The link has TARGET="_self", so when the link is clicked on by the
visitor, the page will load in AIM's profile window (AIM has its own
internal browser), rather than launching an external browser like IE.
The problem is that AIM's internal browser (user-agent: "AIM/30
(Mozilla 1.24b; Windows; I; 32-bit)"), unlike other browsers, does not
replace spaces in URL's with %20 or +. Therefore, AIM sends the raw
space directly to my Apache server (I am running Apache HTTP 1.3.22),
resulting in the following request:

"GET /view.php?id=16&nick=big john HTTP/1.0"

Apache uses a space to differentiate between the request and the
protocol. Since there is a misplaced space in "big john", the request
is broken up and Apache incorrectly identifies "john" as the protocol
(instead of HTTP/1.0). The result is an HTTP 400 error.

My goal, obviously, is to make the page accessible to the visitor
instead of giving him a HTTP 400 error. There are, however, several
problems that complicate the situation:

1.) I have no control over the %n portion of the URL. Since %n is
automatically replaced by AIM with the visitor's Screenname, if the
visitor has a space in his Screenname, the URL will automatically
contain a raw space (as was the case with the "big john" example). I
have no way of converting this raw space into either %20 or + before
it is processed by Apache.

2.) AIM will not replace the raw space for me. AIM, unlike every other
browser, does not convert a URL's unsafe characters into their hex
values. If Internet Explorer had handled this request, it would have
converted the space into %20, resulting in the following request
(which would've worked):

"GET /view.php?id=16&nick=big%20john HTTP/1.0"

AIM, however, does not do this, which is why I get stuck with this bad request:

"GET /view.php?id=16&nick=big john HTTP/1.0"

Because there is nothing I can do to prevent the raw space from being
sent to Apache, and cannot translate the space to %20, I thus need
some way to configure Apache so that it will accept this bad request.
From searching the Internet, it appears that the answer to this lies
in using mod_rewrite. However, I have no experience doing this, so I
do not know what rule/s I would need to add to make this work. I found
the following solution on Google Groups
(http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&selm=U6OU7.160915%24Ga5.25940562%40typhoon.tampabay.rr.com),
but it did not work when I tried it on my server (I'm not sure whether
the code's flawed, whether I implemented it wrong, or whether the
solution's just simply too old for my version of Apache). Another
person (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&safe=off&selm=8krV9.4847%24kH3.1571%40sccrnsc03)
was able to write a program to fix this problem, but I do not know the
source code, nor do I know how to write it.

I am requesting step-by-step instructions for a working solution on
how to make Apache handle my bad request so that the request will go
through (I obviously need all variables intact, or at least intact
enough so that they can be manipulated by the view.php script). Since
I have no experience with Apache or rewriting, I would need
easy-to-follow directions complete with all necessary source code.

Thanks in advance.

P.S. Note that the solution lies in configuring Apache, and NOT in
rewriting the PHP script. Any changes to the PHP script itself will be
useless because Apache prevents the request from ever reaching the
actual .php file.

Request for Question Clarification by wildeeo-ga on 01 Jul 2004 14:47 PDT
Hi,

I'm assuming you have root access to the server, or at least the
ability to modify the httpd.conf file? The solution I have requires
this.

Thanks

Clarification of Question by whiteout-ga on 01 Jul 2004 15:40 PDT
Yes, absolutely. I have root access to the server.
Answer  
Subject: Re: Apache's handling of spaces in GET request
Answered By: wildeeo-ga on 01 Jul 2004 16:15 PDT
Rated:5 out of 5 stars
 
Hi, whiteout, and thanks for your question.

While it is possible to use the insanely powerful mod_rewrite to solve
part of this problem, it's unnecessary, and can be extremely complex
to get working properly.

The first part of the problem is the '400 Bad Request' error recent
versions of Apache will return.

Apache generally conforms strictly to the HTTP Protocol, which
disallows any spaces in the request. Indeed, as you mentioned, more
than two spaces in the request string will cause Apache to display the
'400 Bad Request' page.

Older versions of Apache (pre-1.3.26) would allow these malformed
requests, parsing the URI up to the first encountered space, and if
the remainder was not in the 'HTTP/x.x' format, it would ignore it,
assuming it was HTTP/1.0. This error was fixed in 1.3.26.

There is, however, a rare option - the 'ProtocolReqCheck' option -
that will restore this functionality/bug. So, your first step should
be to add the option 'ProtocolReqCheck off' in Section One of your
apache config file. If you are unsure where to add it, insert it on a
new line after the 'ServerType standalone' line.

(Be warned that clients that use this strange syntax will be assumed
to be HTTP/1.0 clients, and will possibly lose HTTP/1.1 functionality.
This should not really be a problem, though.)

When apache is next restarted, requests in the form

GET /?z=bla bla HTTP/1.1

will be processed by apache. This is not the complete solution to the
problem, though, since apache will only process the URL up to the
first space (so $_GET['z'] will be 'bla', not 'bla bla' in the above
example).

You can get around this by using getenv('SERVER_PROTOCOL'), which
returns everything after the url (for example, with the above request,
it would return 'bla HTTP/1.0').

The below code would work for your example, setting 'nick' to the full nickname:



// get everything after the url in the request
$serv_prot = getenv('SERVER_PROTOCOL');

// get rid of the http/x.x part of it
$sans_protocol = str_replace(array(" HTTP/1.0", " HTTP/1.1"), "", $serv_prot);

// set nick to nick from query string + space + $sans_protocol from above
$nick = $_GET['nick'] . " " . $sans_protocol;



This code would have to appear at the top of any script that needed to
access the full nickname. The 'nick' variable will then contain the
fill nickname'. This will also need modifying if there are additional
variables specified after 'nick' in the query string, or if 'nick' is
renamed.

This has been tested and works on Apache 1.3.29. It should work on any
version of Apache 1 >= 1.3.27.


These sites may provide futher information:
http://forums.devshed.com/archive/t-58255
http://forums.devshed.com/t46291/s.html
http://forums.devshed.com/t26614/s.html
http://apache.active-venture.com/mod/core8.htm (the last item on the page)


The following searches may be of use to you:
('subprofile url spaces') : ://www.google.com/search?q=subprofile+url+spaces
(subprofile.com has the ability to process these types of requests)
://www.google.com/search?q=aim+profiles+space+%22400+bad+request%22



I hope this is of use to you. If I was unclear in any part, please do
not hesitate to request a clarification.

--wildeeo-ga

Request for Answer Clarification by whiteout-ga on 01 Jul 2004 18:38 PDT
Hi wildeeo,

Thank you for your response. While your answer was well-researched and
presented nicely, there is a slight problem: I am on Apache 1.3.22.
For some reason, even though my version is pre-1.3.26 and should allow
the malformed requests, it does not. Do you have any other explanation
as to why I still have this problem on Apache 1.3.22?

I know this is not a problem with the php script - I do remember that
on older versions of Apache (I forget the version number exactly), my
script worked perfectly in AIM. However, when I upgraded Apache (to
1.3.22), I found that the script no longer worked, and I encountered
the problems I outlined originally.

Could there perhaps be another cause behind this?

Clarification of Answer by wildeeo-ga on 01 Jul 2004 19:42 PDT
Hi,

That is very strange.

You are right; this cannot be an error with the PHP script, since this
error is generated before the script is even called (it's generated
even before the mod_rewrite rules are called, which are defined in the
config files).

However, I quickly tested this against several Apache 1.3.22 servers,
and could not reproduce this behaviour.

(I do not have access to an Apache 1.3.22 server. I tested it by
searching Google for "Apache/1.3.22 Server at" "Index of", telnetting
to several servers listed, and sending the 'GET /?bla=bl a HTTP/1.0'
request. In all cases, I was given the index page, and the only time I
got a 400 error was when I claimed I was HTTP 1.0 without specifying a
host).

The version history for 1.3.22 also does not mention anything related
to this (http://www.apacheweek.com/issues/01-10-12). The apache docs
for 1.3.22 (available at
http://archive.apache.org/dist/httpd/old/apache_1.3.22.tar.gz) do not
mention any option that could be relevant.

The only explanations I can think of are:

- The client is identifying itself as HTTP/1.1 compatible, and not
sending a host header. (I doubt this, because it wouldn't work with
screen names without a space either).

- The server isn't really running Apache 1.3.22. Again, this is
unlikely, but if, for example, you had changed the server banner using
the 'ServerSignature' directive, it's possible. But unlikely.

- You are running some strange module that is returning the 400 error
independently of the 'core' webserver. Again, unlikely.

- The version of Apache you are running has been patched. I believe
this to be the most probable explanation.

Sometimes, vendors will include patched versions of Apache with their
distribution. Since the functionality you require could be considered
a bug (theoretically, the 'space-in-request' should never happen, even
though it does), it's possible a patch was released and applied to
your version of Apache.

If this is the case, the patch would have been applied at compile-time
and there is probably no way to disable it.

The only solution I can think of is to upgrade your version of Apache
to 1.3.27-1.3.29 that does support this directive. Is this feasible?

-- wildeeo

Clarification of Answer by wildeeo-ga on 01 Jul 2004 19:44 PDT
(Apologies; a line above should have read "...and the only time I got
a 400 error was when I claimed I was HTTP 1.1 compatible without
specifying a host.")
whiteout-ga rated this answer:5 out of 5 stars
Answer was clear and detailed. Very responsive. Could not have asked
for more. Thanks!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy