Google Answers Logo
View Question
Q: how google converts pdf to html. ( Answered,   2 Comments )
Subject: how google converts pdf to html.
Category: Computers > Software
Asked by: gimpboy-ga
List Price: $2.00
Posted: 01 Jul 2002 18:43 PDT
Expires: 31 Jul 2002 18:43 PDT
Question ID: 35674
where can i get the software google uses to convert pdf's to html?
gpl'ed would be nice.
Subject: Re: how google converts pdf to html.
Answered By: siliconsamurai-ga on 02 Jul 2002 04:56 PDT
From your question I presume that you don’t care so much how Google
does it as you want to be able to do it yourself. Below are several
options from a free e-mail conversion service run by Adobe which will
accept batch submissions, to some plug-ins for Adobe Acrobat Reader,
and source code for a conversion program which will run under *nix.

Google-specific questions are answered by Google staff but I can
provide a simple way to convert files or pages from .pdf to .html or
.txt if that’s what you want.

The simplest way is to just use Adobe’s accessibility tools. Start
here for the complete set of free online tools.

Or, simply send  Web sites or files directly to Adobe’s conversion
sites as explained next.

E-mail the URL (Web address) of a PDF document in the body of an
e-mail message to and you’ll get back a .TXT (plain
text) translation. Send the e-mail to and you’ll
get back an HTML file.

If you have a .PDF document on local media, send it as a MIME
attachment to an e-mail to for plain text and to for HTML format.

Response time is usually a few minutes. I have been posting these
links on my accessibility Web site for years so no research was

For more options, including the ability to convert from .TIFF to .PDF
or vice versa, or to convert .PDF to a number of different formats see
this site 

The major problem with this site is that the plug-ins don’t work with
the free, downloadable Acrobat Reader; you must have the full
commercial Adobe Acrobat program.

PDFTOHTML is a program which may answer the second part of your
question about a GPL program that does the conversion.

This utility converts .PDF files to HTML or XML formats.

There’s also XPDF, an open source .PDF viewer which is GPL freeware.

This might be the best option if you want to get into the nitty-gritty
of creating your own software or modifying someone else’s programs.

If you really wanted to know how Google does this you’ll have to
contact them directly but I doubt you’d be able to use the same
technology so I hope this provides what you really wanted.

Keywords Used:
pdf conversion gpl



Subject: Re: how google converts pdf to html.
From: szallol-ga on 02 Jul 2002 00:11 PDT
you can look to all these links:,39001930,20055410s,00.htm
Subject: Re: how google converts pdf to html.
From: technonotice-ga on 05 Jul 2002 13:24 PDT could be up your street :-)

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy