Google Answers Logo
View Question
 
Q: Scanning and Digitizing Printed Pages ( Answered,   7 Comments )
Question  
Subject: Scanning and Digitizing Printed Pages
Category: Computers
Asked by: buddy-ga
List Price: $10.00
Posted: 26 Apr 2002 09:30 PDT
Expires: 03 May 2002 09:30 PDT
Question ID: 6189
I would like to know the best hardware and software solution to solve
the following task:

I have copies of about 90 different advertising pamplets that were
distributed over a period of about 15 years.  Each pamplet is 5 3/8
inches by 8 3/8 inches and contains about 30 pages. Each pamplet
contains historic articles including black and white pictures.  These
articles are spread through each booklet and interspersed with
advertisements from various businesses.  Copywright is not a problem.

I want to scan these articles and pictures and then digitize the
articles so I can then edit them into a book that describes the
history of the city and the interesting people and activities.  I
assume since the articles are continued from one page to other pages
that I might need a hand scanner and then some type of program to
digitize the images so they could be edited.
Answer  
Subject: Re: Scanning and Digitizing Printed Pages
Answered By: researcher-ga on 26 Apr 2002 13:19 PDT
 
While a handscanner, like mosquitohawk-ga mentions in the Comments,
can do what you are asking for, it is indeed a less than optimal
solution. A flatbed scanner will yield better quality images, less
likelihood of damaging the originals and provide a quicker rate of
scanning. However, if the binding of the pamphlets prevent the ability
to lay the pages down flat, then a handscanner might be the better
choice. If the bindings can be removed, then it is recommended to do
so in order to increase the speed and quality of the scanning.

There are several things that are required for what you seek to do.
First you will need hardware to scan the pamphlet pages into the
computer. Scanning software is also needed that communicates between
the scanner and the computer. And image editing software to format and
alter the image to make any last minute crops or adjustments. Most
hardware packages include scanning software that is designed for that
particular hardware. Image editing software is also sometimes included
and may either be feature-limited or time-limited. A listing of
alternative scanning software and image editing software is below.

The Flatbed-Scanner-Review.org website provides exhaustive reviews of
everything and anything about flatbed scanners. They have compiled a
list of mid-range to high-end flatbed scanners by manufacturer.
Depending on how much you are willing to spend, you should review
these and decide which one suits your needs the best.

List and product comparisons of mid-range to high range flatbed
scanners
http://www.flatbed-scanner-review.org/flatbed_scanner_reviews_links/flatbed_scanner_reviews.html

They have also compiled a list of "best-of"s for different activities.

"If you ask 100 graphics experts you will get at least a dozen
different groupings of recommendations. Some consistencies will be
universal (no low-dpi consumer-level film scanners, for example; no
low-end flatbed scanners). But even professional digital design
graphics artists will split into different groupings on whether a drum
scanner is needed or whether a high-end flatbed can cover the
situations of the coming millennium."
"The following table, therefore, does its best to be flexible, but the
products tend to be mid-range to introductory high-end. If you are
looking for a scanner for home and family, you can't go wrong with a
Linotype-Hell Saphir Ultra 2 for a flatbed and a Nikon or Polaroid
SprintScan 4000 for 35mm slides"
http://www.flatbed-scanner-review.org/flatbed_scanner_reviews_links/recommended_scanner_list.html

"Linotype-Hell OEMs the Umax as the Saphir Ultra2 with the superior
LinoColor Elite software; high-end models include Cumulus image base
from Canto. For PC Windows a new scanner software is available,
NewColor 4000. Contact Heidelberg CPS"
http://www.flatbed-scanner-review.org/flatbed_scanner_reviews_links/flatbed_scanner_reviews.html

While the Linotype-Hell/Linocolor Sahir Ultra 2 is an expensive
choice, it is also the professional choice. For a detailed information
sheet please visit http://www.photo.net/ezshop/product?product_id=3807

Photo.net has also compiled a list of other high-quality professional
scanners at http://www.photo.net/ezshop/category?category_id=785 and
includes links to their pricing and specification sheets.

PC Photo Review put together a listing of flatbed scanners that
include ratings, reviews and prices. If your needs do not require the
higher-end professional scanners as mentioned before, many of those
found here will meet your needs satisfactorily, such as the Canon
CanoScan N656U.
http://www.pcphotoreview.com/Flatbed,Scanner/PLS_1755crx.aspx


Alternative Scanner Software

VueScan 

"VueScan is a scanning program that works with most high-quality
flatbed and film scanners to produce scans that have excellent color
fidelity and color balance. It is very easy to use, and also has
advanced features for restoring faded colors, batch scanning and other
features used by professional photographers."
http://www.hamrick.com/vsm.html


Image Editing Software

Adobe Photoshop

"Adobe® Photoshop® 7.0 software, the professional image-editing
standard, helps you work more efficiently, explore new creative
options, and produce the highest quality images for print, the Web,
and anywhere else. Create exceptional imagery with easier access to
file data; streamlined Web design;

faster, professional-quality photo retouching; and more."
http://www.adobe.com/products/photoshop/main.html

Corel Draw

"Add a new dimension to your creativity with CorelDRAW® 10 Graphics
Suite! Backed by a decade of award-winning creative power, it delivers
vector illustration, layout, bitmap creation, image-editing, painting
and animation software all in one package."
http://www3.corel.com/cgi-bin/gx.cgi/AppLogic+FTContentServer?pagename=Corel/Product/Details&id=CC1IOY1YKCC

Micrografx Picture Publisher

"Micrografx Picture Publisher 10 Professional is the premier
easy-to-use solution for professional image processing and management
-- from start to finish. Manage, correct, enhance, create, and publish
any type of image, from photos to web graphics to digital art. Picture
Publisher provides the complete package professionals need to turn a
raw image into production-ready art."
http://www.micrografx.com/mgxproducts/picturepublisher.asp

Ulead PhotoImpact

"PhotoImpact 7 is everything you need for digital photo editing,
creative design, and Web graphics. No

other image editor delivers professional results so easily and
affordably."
http://www.ulead.com/pi/runme.htm


Additional information:

Step-by-Step Scanning: A Tutorial
http://www.digitaldog.net/files/Scanningtutorial.pdf

ScanSoft Scanner Guide
http://www.scansoft.com/scannerguide/

"A Few Scanning Tips" by Wayne Fulton
http://www.scantips.com/


Search terms used:

best flatbed scanner
://www.google.com/search?q=best+flatbed+scanner

scanning how-to
://www.google.com/search?q=scanning+how%2Dto

Request for Answer Clarification by buddy-ga on 27 Apr 2002 06:49 PDT
I don't believe the answer covered the best software approach to
digitize the text information that has been scanned.  Please give me
your suggestions.

Clarification of Answer by researcher-ga on 29 Apr 2002 14:49 PDT
By far the best reviews are for ScanSoft's OmniPage Pro 11. An
independent review of Omnipage Pro, "Converting printed pages to
document files," says:

"This optical character recognition software is pretty much best of
breed. It can extract text from PDF files, recognise text in over 100
languages and output in HTML format. It also has an excellent layout
engine, recognising headers and footers, tables, columns, coloured
text and images."
http://www.itreviews.co.uk/software/s126.htm

ScanSoft's OmniPage Pro 11 Website adds:

"OmniPage Pro 11 is the fastest, easiest way to turn paper documents
into digital files you can edit. Its accuracy, formatting and features
are unrivaled and the ability to convert PDF files into Microsoft
Office documents opens a new world of functionality to you."

"OmniPage Pro 11 extends its recognition capabilities from 13 to 114
languages and from one to three alphabets including those using Greek
Latin, and Cyrillic. This new version also handles grayscale and color
images. It can use the extra pixel depth information in grayscale
images to internally generate black-and-white images optimized for
best OCR performance. This allows it to create acceptable results on
pages so degraded or faded they would have previously been described
as unsuitable as sources for OCR processing."
http://www.scansoft.com/omnipage/

Another solution would be PrimeOCR:

"Today's best OCR engines are only achieving, on average, 98%
accuracy, when recognizing typical quality images.  On a typical page
of 2000 characters, that means that 40 errors remain in the OCR
output.
By using PrimeOCR, error rates can be reduced by 65-80%.  This means
that the 40 errors generated by today's OCR engines can be drastically
reduced to 8 by using PrimeOCR."
http://www.primerecognition.com/augprime/prime_ocr.htm


Additional information:

OCR Software Review
http://kachina.kennesaw.edu/~mking/is425/productevals/ocr/

Review of Readiris OCR
http://www.pcworld.com/news/article.asp?aid=9650

Finereader Pro for Mac
http://www.macfinereader.com/

OCR Software For The Mac
http://www.lowendmac.com/misc/01/1127.html


Search terms used:

OCR software "best of"
://www.google.com/search?q=OCR+software+%22best+of%22

Clarification of Answer by researcher-ga on 29 Apr 2002 15:10 PDT
You may also want to read this Google Answers' Question on OCR for
mathematical formulas:
https://answers.google.com/answers/main?cmd=threadview&id=3304
Comments  
Subject: Re: Scanning and Digitizing Printed Pages
From: mosquitohawk-ga on 26 Apr 2002 09:57 PDT
 
Well, a handscanner, unless absolutey necessary probably isn't the
ideal solution. I would recommend a flat-bed scanner if possible. Just
about any software will do, in fact, if you have a Microsoft Windows
product, you should have a basic program called Photoeditor which
would do, but if you want to go all the way, I recommend Adobe
Photoshop. However, most flatbed scanners come with software for
photo-editing that is very user friendly. If you don't want to
purchase the scanner, many copy stores have computers, scanners, and
personnel who can assist you, some libraries in larger areas do as
well.

Good luck.
Subject: Re: Scanning and Digitizing Printed Pages
From: sugarbaby-ga on 26 Apr 2002 13:56 PDT
 
Hi, Buddy!

It sounds like you want to be able to scan the articles and then
convert them to text so they can be manipulated in a word processor.
If that is what you want to do, the process is called "Optical
Character Recognition" or OCR.

Just about any old scanner can do the job of scanning the individual
pages. You will need some special software to "recognize" the
characters on the page and convert them into text.

You can find lots of software which will do this, and many programs
are quite inexpensive. One example is WinOCR 4.0, which you can find
here:
http://www.edti.com/winocr30.htm

You can even try it out for free! For more options, do a Google search
for this search term: optical character recognition software
://www.google.com/search?hl=en&q=optical+character+recognition+software

Hope this helps!

SugarBaby
Subject: Re: Scanning and Digitizing Printed Pages
From: blusynapse-ga on 26 Apr 2002 15:31 PDT
 
Hi Buddy!

I'm sorry, but Researcher-ga addresses only half the issue. By
scanning the pamphlets, you will have the images ready for editing,
but not the text.

In your case, you have almost 2700 pages, and a hand-held scanner. By
now you must have gauged that your scanner is simply inadequate. Also,
it does not make too much sense to purchase a high-end scanner just
for this purpose.

In this scenario, I think I have some good news for you - your best
bet is probably to get a professional company do the conversion for
you. There are some really amazing firms in Asia who do quality work
for incredibly low prices. I have known them to handle digitizing jobs
for entire libraries from Europe, so you can rest assured about
reliability. Just specify them your requirements

As a start you might want to take a look at
http://www.iconquality.com/dpdesc.html (very competent firm, look for
a contact page if you wish to get in touch with them)

If you would rather do the grind yourself, here’s a thing or two in
know from hands-on experience –
Like SugarBaby mentioned, OCR software helps you digitize the printed
matter. WinOCR is a free software, but highly unreliable. Other
state-of-the-art applications (like Abby Finreader – it’s the best of
the breed) have a pretty steep price tag.

Good luck amigo! Keep us posted with your progress :)

BluSynapse
Subject: ps:
From: blusynapse-ga on 26 Apr 2002 15:43 PDT
 
ps: Sorry, I forgot to mention... if you wish to know more about OCR,
just ask! (there are plenty out there to choose from and it sure can
get confusing!)

Sail smooth,

BluSynapse
Subject: Re: Scanning and Digitizing Printed Pages
From: mit-ga on 26 Apr 2002 15:44 PDT
 
You might also consider the EPSON 1640SU 
(http://files.support.epson.com/pdf/per16u/per16usl.pdf), which can
scan pages at 5.5 pages per minute via the autofeeder (the SU denotes
the office model with the autofeeder). You can scan documents to PDF
or JPEG/TIFF/etc.

With the number of pages that you are looking at, you should see if
there is a way to unbind up your pamphlets so that you have loose
pages. This will greatly reduce time spent hand-scanning.

You can then use optical character recognition software (OCR) to
read/translate the text into an editable format. The full version of
Adobe Acrobat has built in OCR functionality.
Subject: Re: Scanning and Digitizing Printed Pages
From: trx430ex-ga on 27 Apr 2002 19:01 PDT
 
I have yet to find OCR technology that works to the proficiently that
a Webmaster, or for that matter any professional company could profit
from.

Grafics,,,that's easy,,text,, that's a whole different enigma.

I have almost all the software listed above pulse some in the multiple
thousand-dollar ranges. And none of it turns out text more than 95%
right.Let alone a graphics and text in the same document!!

Even the best from Xerox doesn't work that well...

I would like to find something that works myself!!!
Subject: Re: Scanning and Digitizing Printed Pages
From: aguydude-ga on 30 Apr 2002 19:18 PDT
 
If your pamphlets can be unbound and can be grouped together with one
another in similar sizes, make sure your scanner has a document feeder
for easy scanning.  Xerox sells scanners that do this (i.e. the
Digipath).  However, it is pretty expensive.  I think blusynapse-ga's
suggestion is your best bet because it will be easier and possiblier
cheaper.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy