Google Answers: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?

View Question

Q: How to use Apache proxy to snoop my machine's HTTPS/SSL communication? ( Answered, 9 Comments )

Question

Subject: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
Category: Computers > Programming
Asked by: gerbil-ga
List Price: $25.00

Posted: 18 Jun 2002 06:44 PDT
Expires: 25 Jun 2002 06:44 PDT
Question ID: 28442

The short version of my question is this:

For reasons explained more fully in the Background section below I
want to be able to snoop HTTPS/SSL traffic between my Linux machine
and assorted remote secure servers.  To do this, I want to set up the
following pipeline:

  browser <-> HTTP proxy <--> Apache proxy <-> secure server

The part of this pipeline I need to understand is the Apache proxy,
whose role is to en/decrypt the secure communication from/to the HTTP
proxy.

How do I set up and operate an Apache proxy on my (non-root) Linux
account?  Note that I have not installed Apache yet, so the answer I
need requires instructions on how to configure the Apache installation
on my home directory to allow for what I want to do.


Background:

I'm a skilled Perl programmer, though I have very little experience in
the area of network programming, and in particular the area of
HTTPS/SSL transactions, which I need to know more about for my current
personal project.

My project is this: I have several accounts (credit cards, bank, etc.)
that offer online access.  I want to write a robot that periodically
visits these (secure) sites (using my passwords, of course), collects
information of interest to me, and e-mails me a report.

In the past, when I've wanted to write a Perl robot to automate the
fetching of information from the Web, I've made heavy use of a little
HTTP proxy written in Perl that I downloaded from the Web some time
ago.  I've modified the proxy so that it logs all the communication
between my browser and the outside.  So I configure my browser to use
the proxy, and proceed to access websites of interest "manually".
Then I use the details logged by the proxy to write a Perl/LWP script
that automates the browser end of this communication.

This strategy doesn't work for my current project, because the proxy
script mentioned above misses the HTTPS communication between browser
and server.

I understand that it is possible to set up some other third-party
proxies (e.g. Apache) as an additional proxy to handle the HTTPS
en/decrypting:

  browser <-> HTTP proxy <--> Apache proxy <-> secure server

I suppose that the Apache proxy speaks HTTPS to the server and speaks
HTTP to the HTTP proxy, performing the en/decryption between the two
on the fly.


NOTE: I would accept as valid answers to my question a detailed
description of an alternative way to accomplish my ultimate goal
(i.e. writing a web robot to automate the gathering of my secure
information) different from the one outlined above.

Answer

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
Answered By: runix-ga on 18 Jun 2002 18:24 PDT

Hello! I'll show you another way to do the research for your program. Instead of sniffing what your browser does, I'll explain you how to understand HTML forms to know what to send to each step of the authentication process and to get the info you're looking for. First of all, this method is not less powerfull than the one with the proxy. It takes more time, but it's better, because you have to understand what does your browser do on each step. ------------------------ Web Developers use HTML forms to send information that the user enter to the server. Forms have input elements, for example, a Text box, a Check box, a Button, etc. Each element has a name and a value (may be entered by the user or be a fixed value) and when the user clicks on the 'Submit' button, the browser sends the information to the server. An example of a simple HTML form: <form action='login.php' method='post'> Username <input type='text' name='username' /><br /> Password <input type='password' name='password' /><br /> <input type='submit' value='Send' /> </form> When this form is rendered by a browser it will show 2 text boxes and a button. Then, when the user clicks the 'Submit' button, the browser will send the information to the program 'login.php' on the server side. There're 2 ways to send the data to the server, specified by the 'method' attribute on the form tag: via POST or via GET GET: When using GET the browser pass parameters after asking for the file. For example, if the previous form used 'GET' to send the information after clicking on the button, the browser will show on the Location Bar: http://www.mysite.com/login.php?username=entered_username&password=entered_password If you want to make a robot to login into that server, you only have to tell it to ask for that URL!! POST: This is more 'secure' because the sent data is not shown on the Location bar. Probably you will need to log in to your account using one of this forms. If you want to make your robot to login into this site (using the above form), check this example: use LWP; use HTTP::Request::Common; $ua = LWP::UserAgent->new; $ua->request(POST 'http://www.mysite.com/login.php', ["username" => "my_username", "password" => "password"]); Please note that every element in a form that has a name will be passed as a parameter to the 'action' script. [ http://www.w3.org/TR/REC-html40/interact/forms.html ] ---- To get the information you're looking for, the steps are: 1) Login into the system 2) Go to the page that has the information 3) Parse it using regex 4) Mail it, print it, etc ( To show you how to understand the forms, I'll develop a little program to log into Google Answers and get the status of the account ( https://answers.google.com/answers/main?cmd=myinvoices )) 1) Login To start, go to the main page of the site you're looking for and click until you get on the 'login page'. In my case, this will be https://answers.google.com/answers/main?cmd=login Then, click with the right button on the page and click on 'View Source'. Find where the form starts (<form ...) and check if it's using GET or POST and where's the information submitted after clicking on the submit button. In my case, the forms starts like this: <form method="post" action="main?cmd=login"> Note that 'action' doesn't have the full URI of the file, so you have to prepend the current directory. After clicking on the 'Login' button, the information will be sent to https://answers.google.com/answers/main?cmd=login Now, check which are the input elements on the form: look for <input.. <textarea... and <select.. tags. Get their names and figure out the value to send to the server. In my case, the input elements are: <input type="text" name="email" size="20"> <input type="password" name="password" size="20"> <input type="submit" name="submit" value="Login"> So, I have to send 3 variables to the server: 'email' with my email address 'password' with my password 'submit' with the value 'Login' Please note, that 'submit' is a button, so the defaul value ('Login') can't be changed. But if this variable is not sent, you won't be logged in Try this little program: ------------------------------------ use LWP; use HTTP::Request::Common; use HTTP::Cookies; $email='my_email@address.com'; $pass='my_google_answers_password'; $ua = LWP::UserAgent->new; $ua->cookie_jar(HTTP::Cookies->new); $req=$ua->request(POST 'https://answers.google.com/answers/main?cmd=login', ['email'=>$email, 'password'=>$pass, 'submit'=>'Login']); if ($req->content=~ /Invalid login/){ print "invalid login!\n"; }else{ print "welcome to google answers :)\n"; } ----------------------------------- In the 8th line, I tell LWP to request 'https://answers.google.com/answers/main?cmd=login' and pass the parameters 'email'=$email, 'password'=$pass and 'submit='Login' Set $email and $pass with your info and try it! 2) Getting the info Now you're into the system, you have to go to the page where the info you're looking for is. Click on the link that takes you there and write down the address on your browser's Location bar when you're there. For example, if I want to get the status of my account, I'll have to go to https://answers.google.com/answers/main?cmd=myinvoices So, after login into the system, I'll go to that address: $req=$ua->request(GET 'https://answers.google.com/answers/main?cmd=myinvoices'); and inside $req->content I'll have the contents of the page. Then, I have to parse it: $req->content=~/<td> Current Earnings $what you will be paid$ for Answering Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/; $ear=$1; $req->content=~/<td> Current Balance $what you will be charged$ for Asked Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/; $char=$1; print "Will be paid: $ear \nWill be charged: $char\n"; -------------------------- The finished script will be: use LWP; use HTTP::Request::Common; use HTTP::Cookies; $email='my_email@address.com'; $pass='my_google_answers_password'; $ua = LWP::UserAgent->new; $ua->cookie_jar(HTTP::Cookies->new); $req=$ua->request(POST 'https://answers.google.com/answers/main?cmd=login', ['email'=>$email, password=>$pass, 'submit'=>'Login']); if ($req->content=~ /Invalid login/){ print "invalid login!\n"; }else{ print "welcome to google answers :)\n"; $req=$ua->request(GET 'https://answers.google.com/answers/main?cmd=myinvoices'); $req->content=~/<td> Current Earnings $what you will be paid$ for Answering Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/; $ear=$1; $req->content=~/<td> Current Balance $what you will be charged$ for Asked Questions: <\/td> <td width="1%"> \$([0-9]+(?:.[0-9]+)?)/; $char=$1; print "Will be paid: $ear \nWill be charged: $char\n"; } ------------------------- Probably it won't be this straightfoward on a Bank (you know, their HTML will be very messy: they don't understand the beauty of the simple things, as google ;) but it won't be very hard if you have patience :) Good luck with your program, and feel free to ask all the clarifications you need! Aditional links: LWP [ http://www.linpro.no/lwp/ ] HTML Forms [ http://www.w3.org/TR/REC-html40/interact/forms.html ] Search Strategy: Personal experience
Request for Answer Clarification by gerbil-ga on 23 Jun 2002 13:35 PDT This answer is not sufficiently generate to be adequate. It presupposes an extremely simple interaction with a server, one that can be gleaned by looking at the displayed page's source. But this is not the case with the servers I want to interact with. They make extensive use of cgi, asp, jsp, and various forms of redirection, which makes useless to inspect the source code for the page that is ultimately displayed. To know what happens between the browser and the server(s) I must be able to snoop all of the interaction between them. I don't consider this question satisfactorily answered.
Clarification of Answer by runix-ga on 23 Jun 2002 15:43 PDT (I posted this clarification as a comment, please ignore it) Gerbil, When a site works dinamically (ie, CGI, PHP, JSP, ASP, etc) it sends to the browser pure HTML. The 'dynamic' part is on the server side (ie, DB access ,etc). There's no way to work on the server side information! The pages that are dinamically generated, are HTML pages. Think about this: your browser only knows about the HTML the site sent: It knows what to do when you press the 'submit' button, from the form definition. I can give you examples about how to handle redirections, if you ask me to. Other technologies that the site may use are cookies which are automatically handled by HTTP::Cookies. If you want to tell me which sophisticated interaction you have to do with the site, I will be happy to help you!
Request for Answer Clarification by gerbil-ga on 23 Jun 2002 19:50 PDT I understand that there is no way for me to find out what the server does internally. The next best thing, as far as I'm concerned, is to be able to fully listen in the communication between server and browser; this gives me all the information that I need to replicate the interaction in a Perl/LWP script. That was the objective of my original query, and I don't think it has been met. When I try the approach you proposed and programmatically request page X, the contents (of the HTTP::Response object) are often completely different from the source that I get if I request page X via the browser. In other words, the browser and the server have a communication that is very different from what I can achieve with LWP and the limited information that I have at my disposal by using the approach you propose. I have no doubt that someone like you could achieve my ultimate goals without needing all the information that I need, but I am not you. And I am also sure that I could achieve my ultimate goal if my query had been answered in the way I originally posed it. Even if I could retain you as a consultant for every single page that I may want to add to the list of sites that my bot would have to visit (I'm sure each one would have idiosyncracies that would need to be dealt with specifically), I would have to reveal to you private information (usernames, passwords, etc.), and that's just not possible. The approach I originally asked about does not have any of these drawbacks: it is completely general, it allows me to listen in the communication between the browser and the server, so that I can trivially replicate it in a Perl script. Your approach, on the other hand, requires a completely ad hoc analysis of each specific site, which, from my vantage point is far from trivial. In other words, I want my money back.
Clarification of Answer by runix-ga on 23 Jun 2002 20:39 PDT Im sorry that you didn't like the answer, but I think that you don't understand how browsers/WWW work (you can develop a browser using LWP, so LWP is not the problem). When you said: 'I would accept as valid answers to my question a detailed description of an alternative way to accomplish my ultimate goal' I thought that you were open to learn new ways to write your bot. Please write to answers-editors@google.com and ask for your money back or a repost. The Id for this question is 28442. Good luck.

Comments

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: legendlength-ga on 18 Jun 2002 08:59 PDT

I would write my own bot using OpenSSL (http://www.openssl.org/).

I have used their libraries to add SSL to a HTTP server that I wrote,
and it was very quick to get going.  The calls in the library are very
similar to the standard TCP send() & recv(), so it's really just a
matter or replacing all of your send() and recv()'s with the ones in
the library.

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: bkeeler-ga on 18 Jun 2002 12:42 PDT

Your assumption about how HTTP proxies handle HTTPS traffic is wrong. 
You assumed that the proxy speaks HTTPS to the server, and the client
speaks HTTP to the proxy.  Not so.  When the browser wants to make a
secure connection, it issues a 'CONNECT' command to proxy.  The proxy
connects to the host and port specified, then simply forwards
encrypted traffic back and forth between browser and client.  It
cannot decrypt the traffic; if it could, SSL would be worthless as a
security feature.

In theory, a custom-written proxy could perform a "man in the middle"
attack by pretending to be the destination server in question.  The
browser would detect the fraud because the proxy would not be able to
present a valid SSL certificate which matches the server hostname. 
The browser would pop up a dialog warning you of the potential
security risk, but giving you the option to proceed.

I don't know of any Apache module to do this kind of thing.  I
probably would not be too hard to adapt your Perl proxy to do it
though.  Perl can do SSL quite easily, though it helps to understand
the princples of SSL, public key infrastructure, certificates and so
on.

Further reading:

Open Source PKI book: 
http://ospkibook.sourceforge.net/docs/OSPKI-2.4.7/OSPKI-html/ospki-book.htm

Perl IO::Socket:SSL module:
http://search.cpan.org/search?dist=IO-Socket-SSL

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: quesera-ga on 18 Jun 2002 20:52 PDT

With all due respect, the answer given seems to miss the point.  

Find an SSL module on CPAN and use it to write your own SSL bot, like
you did your original HTTP bot.  It will take about ten extra lines to
set up the SSL session, and it's very well documented.

However, this *CAN* be done using a proxy as well.  In fact, I used to
do this when I was testing browsers that didn't do SSL.  The comment
is correct that when using a proxy, the browser sends a CONNECT
instead of a GET, but some proxies will  initiate a second SSL session
(man-in-the-middle) instead of forwarding the one.  This is only a
security problem if you don't trust the network between you and your
proxy, which used to be the working assumption.

Even so, using a proxy is definitely the hard way.  Check CPAN and
you'll have things up and running in no time.

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: gerbil-ga on 19 Jun 2002 02:26 PDT

In response to quesera-ga's comment, I don't understand why his
solution is
better than the one given by the Google Researcher.  The latter
doesn't
require learning anything more than LWP (which I'm already pretty
familiar
with).  I don't see what I gain by ignoring the Researcher's advice
and
plunging into CPAN in search of some unspecified solution.

In the Usenet, I've gotten many answers to my question in the same
style as all the other
comments here.  They are all along the lines of "sure, it's easy, just
do X", where
X is a short phrase like "look around in CPAN" or "roll your own with
OpenSSL".  I wish these commentators realized that they are not being
helpful at all; in fact they are being less than helpful by adding to
my confusion, and wasting my time (not to mention theirs).  The reason
I'm forking money for this answer is because such glib Usenet-grade
advice has proven useless to me.

The reply by the Google Researcher is not great, but at least it is
detailed and specific...

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: quesera-ga on 20 Jun 2002 03:43 PDT

Sorry my response wasn't considered helpful.  You stated that you were
a skilled Perl programmer, so I assumed that you'd be perfectly
comfortable using CPAN.  If you are not, I highly recommend becoming
so.  I don't know any perl programmers of any level who don't consider
navigation of CPAN a critical skill.

Nonetheless, it's a question of style.  For my tastes, LWP is often
too abstracting from the job at hand.  I know exactly how HTTP and
forms work, so doing processing with LWP is actually more difficult
for me because instead of working with the protocol directly, you have
to work with the LWP author's idea of an interface into the protocol.

I gathered from your question that you knew how to work with forms,
just not how to add SSL abilities to your form processor.  So I
approached the question from that angle -- rather than rethinking your
entire design, just use the SSL module on CPAN and ten or so lines of
function calls to set up the SSL session before doing exactly what you
were already doing.  The researcher's answer seemed to me to take you
down a longer path and learning curve to get to the equivalent place.

Nonetheless, and I should have noted it earlier, it is a great
overview of how to use LWP to do some simple form processing, for
people who don't want or need to understand how forms work.

Good luck.

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: gerbil-ga on 23 Jun 2002 13:39 PDT

There is a typo in my "Request for Clarification".  It should read

"This answer is not sufficiently general to be adequate."

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: runix-ga on 23 Jun 2002 15:41 PDT

Gerbil,

When a site works dinamically (ie, CGI, PHP, JSP, ASP, etc) it sends
to the browser  pure HTML. The 'dynamic' part is on the server side
(ie, DB access ,etc). There's no way to work on the server side
information!

The pages that are dinamically generated, are HTML pages.
Think about this: your browser only knows about the HTML the site
sent: It knows what to do when you press the 'submit' button, from the
form definition.

I can give you examples about how to handle redirections, if you ask
me to.

Other technologies that the site may use are cookies which are
automatically handled by HTTP::Cookies.

If you want to tell me which sophisticated interaction you have to do
with the site, I will be happy to help you!

Subject: Re: How to use Apache proxy to snoop my machine's HTTPS/SSL communication?
From: daemon-ga on 25 Jun 2002 01:23 PDT

I'd probably approach this problem from a different direction.    I'd
simply grab a copy of the Mozilla browser source, and hack it such
that it spits out
inputs and outputs to one or more external files post SSL decoding.   
Make your changes, recompile and snoop away.    Every line going into
the SSL encoder would be output first.  Every line just after
decoding, would be spit out as well.

As for what a browser returning being different than what you get with
other tools, it's often just a matter of setting your Agent string to
match whatever
browser you're trying to pretend to be.   Sample agent strings can be
found in any web log, or webalizer statistics page.
Even WGET has ssl support and the ability to fake an agent.  I could
manually hack through a set of SSL pages using wget pretending to be
IE5.5 and if the javascript isn't too complicated automate the process
with perl.   A page that relies on extensive form submissions is a
trickier proposition.

     ian

Subject: Possibly a useful tool
From: phineas42-ga on 10 Nov 2004 13:04 PST

So lots of time has passed; here is a solution that wasn't entirely
available two years ago.

I don't believe this is exactly what was requested, but you may be
interested to simply look at the HTTP headers (and it works with
https).

I use an extension for firefox called "Live HTTP Headers"

http://livehttpheaders.mozdev.org/

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy