Google Answers Logo
View Question
 
Q: How to use Apache proxy to snoop my machine's HTTPS/SSL communication? ( Answered,   9 Comments )
Question  
Subject: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
Category: Computers > Programming
Asked by: gerbil-ga
List Price: $25.00
Posted: 18 Jun 2002 06:44 PDT
Expires: 25 Jun 2002 06:44 PDT
Question ID: 28442
The short version of my question is this:

For reasons explained more fully in the Background section below I
want to be able to snoop HTTPS/SSL traffic between my Linux machine
and assorted remote secure servers.  To do this, I want to set up the
following pipeline:

  browser <-> HTTP proxy <--> Apache proxy <-> secure server

The part of this pipeline I need to understand is the Apache proxy,
whose role is to en/decrypt the secure communication from/to the HTTP
proxy.

How do I set up and operate an Apache proxy on my (non-root) Linux
account?  Note that I have not installed Apache yet, so the answer I
need requires instructions on how to configure the Apache installation
on my home directory to allow for what I want to do.


Background:

I'm a skilled Perl programmer, though I have very little experience in
the area of network programming, and in particular the area of
HTTPS/SSL transactions, which I need to know more about for my current
personal project.

My project is this: I have several accounts (credit cards, bank, etc.)
that offer online access.  I want to write a robot that periodically
visits these (secure) sites (using my passwords, of course), collects
information of interest to me, and e-mails me a report.

In the past, when I've wanted to write a Perl robot to automate the
fetching of information from the Web, I've made heavy use of a little
HTTP proxy written in Perl that I downloaded from the Web some time
ago.  I've modified the proxy so that it logs all the communication
between my browser and the outside.  So I configure my browser to use
the proxy, and proceed to access websites of interest "manually".
Then I use the details logged by the proxy to write a Perl/LWP script
that automates the browser end of this communication.

This strategy doesn't work for my current project, because the proxy
script mentioned above misses the HTTPS communication between browser
and server.

I understand that it is possible to set up some other third-party
proxies (e.g. Apache) as an additional proxy to handle the HTTPS
en/decrypting:

  browser <-> HTTP proxy <--> Apache proxy <-> secure server

I suppose that the Apache proxy speaks HTTPS to the server and speaks
HTTP to the HTTP proxy, performing the en/decryption between the two
on the fly.


NOTE: I would accept as valid answers to my question a detailed
description of an alternative way to accomplish my ultimate goal
(i.e. writing a web robot to automate the gathering of my secure
information) different from the one outlined above.
Answer  
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
Answered By: runix-ga on 18 Jun 2002 18:24 PDT
 
Hello! 

I'll show you another way to do the research for your program. Instead
of sniffing what your browser does, I'll explain you how to understand
HTML forms to know what to send to each step of the authentication
process and to get the info you're looking for.

First of all, this method is not less powerfull than the one with the
proxy. It takes more time, but it's better, because you have to
understand what does your browser do on each step.

------------------------

Web Developers use HTML forms to send information that the user enter
to the server.
Forms have input elements, for example, a Text box, a Check box,  a
Button, etc.

Each element has a name and a value (may be entered by the user or be
a fixed value) and when the user clicks on the 'Submit' button, the
browser sends the information to the server.

An example of a simple HTML form:

<form action='login.php' method='post'>
 Username <input type='text' name='username' /><br />
 Password <input type='password' name='password' /><br />
 <input type='submit' value='Send' />
</form>

When this form is rendered by a browser it will show 2 text boxes and
a button.
Then, when the user clicks the 'Submit' button, the browser will send
the information to the program 'login.php' on the server side.
There're 2 ways to send the data to the server, specified by the
'method' attribute on the form tag: via POST or via GET

GET: When using GET the browser pass parameters after asking for the
file. For example, if the previous form used 'GET' to send the
information after clicking on the button, the browser will show on the
Location Bar:

http://www.mysite.com/login.php?username=entered_username&password=entered_password

If you want to make a robot to login into that server, you only have
to tell it to ask for that URL!!


POST: This is more 'secure' because the sent data is not shown on the
Location bar. Probably you will need to log in to your account using
one of this forms.

If you want to make your robot to login into this site (using the
above form), check this example:

use LWP;
use HTTP::Request::Common;
$ua = LWP::UserAgent->new;
$ua->request(POST 'http://www.mysite.com/login.php', ["username" =>
"my_username", "password" => "password"]);


Please note that every element in a form that has a name will be
passed as a parameter to the 'action' script.

[ http://www.w3.org/TR/REC-html40/interact/forms.html ]


----

To get the information you're looking for, the steps are:

1) Login into the system
2) Go to the page that has the information
3) Parse it using regex
4) Mail it, print it, etc

( To show you how to understand the forms, I'll develop a little
program to log into Google Answers and get the status of the account (
https://answers.google.com/answers/main?cmd=myinvoices ))

1) Login

To start, go to the main page of the site you're looking for and click
until you get on the 'login page'.
In my case, this will be
https://answers.google.com/answers/main?cmd=login


Then, click with the right button on the page and click on 'View
Source'. Find where the form starts (<form ...) and check if it's
using GET or POST and where's the information submitted after clicking
on the submit button.
In my case, the forms starts like this:

	<form method="post" action="main?cmd=login">

Note that 'action' doesn't have the full URI of the file, so you have
to prepend the current directory.
After clicking on the 'Login' button, the information will be sent to
https://answers.google.com/answers/main?cmd=login


Now, check which are the input elements on the form: look for <input..
<textarea... and <select.. tags. Get their names and figure out the
value to send to the server.

In my case, the input elements are:

<input type="text" name="email" size="20">
<input type="password" name="password" size="20">
<input type="submit" name="submit" value="Login">

So, I have to send 3 variables to the server:
'email' with my email address
'password' with my password
'submit' with the value 'Login'

Please note, that 'submit' is a button, so the defaul value ('Login')
can't be changed. But if this variable is not sent, you won't be
logged in

Try this little program:

------------------------------------
use LWP;
use HTTP::Request::Common;
use HTTP::Cookies;

$email='my_email@address.com';
$pass='my_google_answers_password';

$ua = LWP::UserAgent->new;
$ua->cookie_jar(HTTP::Cookies->new);

$req=$ua->request(POST
'https://answers.google.com/answers/main?cmd=login', ['email'=>$email,
'password'=>$pass, 'submit'=>'Login']);
if ($req->content=~ /Invalid login/){
    print "invalid login!\n";
}else{
    print "welcome to google answers :)\n";
}

-----------------------------------

In the 8th line, I tell LWP to request
'https://answers.google.com/answers/main?cmd=login' and pass the
parameters 'email'=$email, 'password'=$pass and 'submit='Login'

Set $email and $pass with your info and try it!


2) Getting the info

Now you're into the system, you have to go to the page where the info
you're looking for is. Click on the link that takes you there and
write down the address on your browser's Location bar when you're
there.

For example, if I want to get the status of my account, I'll have to
go to https://answers.google.com/answers/main?cmd=myinvoices

So, after login into the system, I'll go to that address:

$req=$ua->request(GET
'https://answers.google.com/answers/main?cmd=myinvoices');

and inside $req->content I'll have the contents of the page. Then, I
have to parse it:

$req->content=~/<td> Current Earnings \(what you will be paid\) for
Answering Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/;
$ear=$1;
$req->content=~/<td> Current Balance \(what you will be charged\) for
Asked Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/;
$char=$1;
print "Will be paid: $ear \nWill be charged: $char\n";

--------------------------

The finished script will be:

use LWP;
use HTTP::Request::Common;
use HTTP::Cookies;

$email='my_email@address.com';
$pass='my_google_answers_password';

$ua = LWP::UserAgent->new;
$ua->cookie_jar(HTTP::Cookies->new);

$req=$ua->request(POST
'https://answers.google.com/answers/main?cmd=login', ['email'=>$email,
password=>$pass, 'submit'=>'Login']);
if ($req->content=~ /Invalid login/){
    print "invalid login!\n";
}else{
    print "welcome to google answers :)\n";
    $req=$ua->request(GET
'https://answers.google.com/answers/main?cmd=myinvoices');
    $req->content=~/<td> Current Earnings \(what you will be paid\)
for Answering Questions: <\/td> <td width="1%">
\$([0-9]+(?:.[0-9]+)?)/;
    $ear=$1;
    $req->content=~/<td> Current Balance \(what you will be charged\)
for Asked Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/;
    $char=$1;
    print "Will be paid: $ear \nWill be charged: $char\n";
}



-------------------------

Probably it won't be this straightfoward on a Bank (you know, their
HTML will be very messy: they don't understand the beauty of the
simple things, as google ;) but it won't be very hard if you have
patience :)

Good luck with your program, and feel free to ask all the
clarifications you need!



Aditional links:

LWP
[ http://www.linpro.no/lwp/ ]

HTML Forms
[ http://www.w3.org/TR/REC-html40/interact/forms.html ]



Search Strategy:

Personal experience

Request for Answer Clarification by gerbil-ga on 23 Jun 2002 13:35 PDT
This answer is not sufficiently generate to be adequate.  It
presupposes an extremely simple interaction with a server, one that
can be gleaned by looking at the displayed page's source.  But this is
not the case with the servers I want to interact with.  They make
extensive use of cgi, asp, jsp, and various forms of redirection,
which makes useless to inspect the source code for the page that is
ultimately displayed.  To know what happens between the browser and
the server(s) I must be able to snoop all of the interaction between
them. I don't consider this question satisfactorily answered.

Clarification of Answer by runix-ga on 23 Jun 2002 15:43 PDT
(I posted this clarification as a comment, please ignore it)

Gerbil, 
When a site works dinamically (ie, CGI, PHP, JSP, ASP, etc) it sends
to the browser  pure HTML. The 'dynamic' part is on the server side
(ie, DB access ,etc). There's no way to work on the server side
information! 

The pages that are dinamically generated, are HTML pages.
Think about this: your browser only knows about the HTML the site
sent: It knows what to do when you press the 'submit' button, from the
form definition. 

I can give you examples about how to handle redirections, if you ask
me to. 

Other technologies that the site may use are cookies which are
automatically handled by HTTP::Cookies. 

If you want to tell me which sophisticated interaction you have to do
with the site, I will be happy to help you!

Request for Answer Clarification by gerbil-ga on 23 Jun 2002 19:50 PDT
I understand that there is no way for me to find out what the server
does internally.  The next best thing, as far as I'm concerned, is to
be able to fully listen in the communication between server and
browser; this gives *me* all the information that *I* need to
replicate the interaction in a Perl/LWP script.  That was the
objective of my original query, and I don't think it has been met. 
When I try the approach you proposed and programmatically request page
X, the contents (of the HTTP::Response object) are often completely
different from the source that I get if I request page X via the
browser.  In other words, the browser and the server have a
communication that is very different from what I can achieve with LWP
and the limited information that I have at my disposal by using the
approach you propose.


I have no doubt that someone like you could achieve my ultimate goals
without needing all the information that I need, but I am not you. 
And I am also sure that I could achieve my ultimate goal if my query
had been answered in the way I originally posed it.  Even if I could
retain you as a consultant for every single page that I may want to
add to the list of sites that my bot would have to visit (I'm sure
each one would have idiosyncracies that would need to be dealt with
specifically), I would have to reveal to you private information
(usernames, passwords, etc.), and that's just not possible.  The
approach I originally asked about does not have any of these
drawbacks: it is completely general, it allows me to listen in the
communication between the browser and the server, so that I can
*trivially* replicate it in a Perl script.  Your approach, on the
other hand, requires a completely ad hoc analysis of each specific
site, which, from my vantage point is far from trivial.  In other
words, I want my money back.

Clarification of Answer by runix-ga on 23 Jun 2002 20:39 PDT
Im sorry that you didn't like the answer, but I think that you don't
understand how browsers/WWW work (you can develop a browser using LWP,
so LWP is not the problem). When you said: 'I would accept as valid
answers to my question a detailed description of an alternative way to
accomplish my ultimate goal' I thought that you were open to learn new
ways to write your bot.

Please write to answers-editors@google.com and ask for your money back
or a repost. The Id for this question is 28442.

Good luck.
Comments  
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: legendlength-ga on 18 Jun 2002 08:59 PDT
 
I would write my own bot using OpenSSL (http://www.openssl.org/).

I have used their libraries to add SSL to a HTTP server that I wrote,
and it was very quick to get going.  The calls in the library are very
similar to the standard TCP send() & recv(), so it's really just a
matter or replacing all of your send() and recv()'s with the ones in
the library.
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: bkeeler-ga on 18 Jun 2002 12:42 PDT
 
Your assumption about how HTTP proxies handle HTTPS traffic is wrong. 
You assumed that the proxy speaks HTTPS to the server, and the client
speaks HTTP to the proxy.  Not so.  When the browser wants to make a
secure connection, it issues a 'CONNECT' command to proxy.  The proxy
connects to the host and port specified, then simply forwards
encrypted traffic back and forth between browser and client.  It
cannot decrypt the traffic; if it could, SSL would be worthless as a
security feature.

In theory, a custom-written proxy could perform a "man in the middle"
attack by pretending to be the destination server in question.  The
browser would detect the fraud because the proxy would not be able to
present a valid SSL certificate which matches the server hostname. 
The browser would pop up a dialog warning you of the potential
security risk, but giving you the option to proceed.

I don't know of any Apache module to do this kind of thing.  I
probably would not be too hard to adapt your Perl proxy to do it
though.  Perl can do SSL quite easily, though it helps to understand
the princples of SSL, public key infrastructure, certificates and so
on.

Further reading:

Open Source PKI book: 
http://ospkibook.sourceforge.net/docs/OSPKI-2.4.7/OSPKI-html/ospki-book.htm

Perl IO::Socket:SSL module:
http://search.cpan.org/search?dist=IO-Socket-SSL
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: quesera-ga on 18 Jun 2002 20:52 PDT
 
With all due respect, the answer given seems to miss the point.  

Find an SSL module on CPAN and use it to write your own SSL bot, like
you did your original HTTP bot.  It will take about ten extra lines to
set up the SSL session, and it's very well documented.

However, this *CAN* be done using a proxy as well.  In fact, I used to
do this when I was testing browsers that didn't do SSL.  The comment
is correct that when using a proxy, the browser sends a CONNECT
instead of a GET, but some proxies will  initiate a second SSL session
(man-in-the-middle) instead of forwarding the one.  This is only a
security problem if you don't trust the network between you and your
proxy, which used to be the working assumption.

Even so, using a proxy is definitely the hard way.  Check CPAN and
you'll have things up and running in no time.
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: gerbil-ga on 19 Jun 2002 02:26 PDT
 
In response to quesera-ga's comment, I don't understand why his
solution is
better than the one given by the Google Researcher.  The latter
doesn't
require learning anything more than LWP (which I'm already pretty
familiar
with).  I don't see what I gain by ignoring the Researcher's advice
and
plunging into CPAN in search of some unspecified solution.

In the Usenet, I've gotten many answers to my question in the same
style as all the other
comments here.  They are all along the lines of "sure, it's easy, just
do X", where
X is a short phrase like "look around in CPAN" or "roll your own with
OpenSSL".  I wish these commentators realized that they are not being
helpful at all; in fact they are being less than helpful by adding to
my confusion, and wasting my time (not to mention theirs).  The reason
I'm forking money for this answer is because such glib Usenet-grade
advice has proven useless to me.

The reply by the Google Researcher is not great, but at least it is
detailed and specific...
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: quesera-ga on 20 Jun 2002 03:43 PDT
 
Sorry my response wasn't considered helpful.  You stated that you were
a skilled Perl programmer, so I assumed that you'd be perfectly
comfortable using CPAN.  If you are not, I highly recommend becoming
so.  I don't know any perl programmers of any level who don't consider
navigation of CPAN a critical skill.

Nonetheless, it's a question of style.  For my tastes, LWP is often
too abstracting from the job at hand.  I know exactly how HTTP and
forms work, so doing processing with LWP is actually more difficult
for me because instead of working with the protocol directly, you have
to work with the LWP author's idea of an interface into the protocol.

I gathered from your question that you knew how to work with forms,
just not how to add SSL abilities to your form processor.  So I
approached the question from that angle -- rather than rethinking your
entire design, just use the SSL module on CPAN and ten or so lines of
function calls to set up the SSL session before doing exactly what you
were already doing.  The researcher's answer seemed to me to take you
down a longer path and learning curve to get to the equivalent place.

Nonetheless, and I should have noted it earlier, it is a great
overview of how to use LWP to do some simple form processing, for
people who don't want or need to understand how forms work.

Good luck.
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: gerbil-ga on 23 Jun 2002 13:39 PDT
 
There is a typo in my "Request for Clarification".  It should read

"This answer is not sufficiently general to be adequate."
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: runix-ga on 23 Jun 2002 15:41 PDT
 
Gerbil,

When a site works dinamically (ie, CGI, PHP, JSP, ASP, etc) it sends
to the browser  pure HTML. The 'dynamic' part is on the server side
(ie, DB access ,etc). There's no way to work on the server side
information!

The pages that are dinamically generated, are HTML pages.
Think about this: your browser only knows about the HTML the site
sent: It knows what to do when you press the 'submit' button, from the
form definition.

I can give you examples about how to handle redirections, if you ask
me to.

Other technologies that the site may use are cookies which are
automatically handled by HTTP::Cookies.

If you want to tell me which sophisticated interaction you have to do
with the site, I will be happy to help you!
Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: daemon-ga on 25 Jun 2002 01:23 PDT
 
I'd probably approach this problem from a different direction.    I'd
simply grab a copy of the Mozilla browser source, and hack it such
that it spits out
inputs and outputs to one or more external files post SSL decoding.   
Make your changes, recompile and snoop away.    Every line going into
the SSL encoder would be output first.  Every line just after
decoding, would be spit out as well.

As for what a browser returning being different than what you get with
other tools, it's often just a matter of setting your Agent string to
match whatever
browser you're trying to pretend to be.   Sample agent strings can be
found in any web log, or webalizer statistics page.
Even WGET has ssl support and the ability to fake an agent.  I could
manually hack through a set of SSL pages using wget pretending to be
IE5.5 and if the javascript isn't too complicated automate the process
with perl.   A page that relies on extensive form submissions is a
trickier proposition.

     ian
Subject: Possibly a useful tool
From: phineas42-ga on 10 Nov 2004 13:04 PST
 
So lots of time has passed; here is a solution that wasn't entirely
available two years ago.

I don't believe this is exactly what was requested, but you may be
interested to simply look at the HTTP headers (and it works with
https).

I use an extension for firefox called "Live HTTP Headers"

http://livehttpheaders.mozdev.org/

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy