Google Answers Logo
View Question
Q: Scanning 100.000 pages ( Answered,   3 Comments )
Subject: Scanning 100.000 pages
Category: Computers
Asked by: mindaugas-ga
List Price: $100.00
Posted: 29 Sep 2006 16:54 PDT
Expires: 29 Oct 2006 15:54 PST
Question ID: 769584
I have a project to scan about 300 books (different sizes) and 100
magazines (about 100.000 pages total). It needs to be done fast. Books
and magazines are ready as separate pages - no binding. I have a
limited budged (buying $10,000 scaner is not an opption for me -
renting maybe). I need a complete solution for coverting books and
mags into digital format - including what scanner or copier with a
scanner would do this fast, what serchable format to choose, how to
scan mags - image enhancement, etc.
Subject: Re: Scanning 100.000 pages
Answered By: crabcakes-ga on 29 Sep 2006 23:27 PDT
Hello Mindaugas,

   I have selected some scanners, in a wide range of prices, found
from reading scanner reviews and recommendations. I feel sure one or
more of these will complete the work you need to complete. I have also
included some OCR software reviews.

About document scanners

   ?The overwhelming majority of document scanners are sheetfed with
an automatic document feeder (ADF). Part of what distinguishes
document scanners from other sheetfed scanners is that most offer both
a simplex mode to scan one side of each page and a duplex mode to scan
both sides.

Don't confuse a duplex scanner, which scans both sides of the page at
once, with a duplex ADF, which scans one side, turns the page over,
and then scans the other side. You can confirm that the scanner
duplexes from its claimed speeds. The rating for pages per minute
(ppm) tells you how many sheets of paper it scans per minute. The
rating for images per minute (ipm) tells you how many images it scans,
with one on each side of the page. Make sure the duplexing ipm speed
is double the ppm speed.? Please read this web site for complete

?Adding recognition can add a significant amount of time or hardly
any, depending on the software. Thus, one scanner can have a slower
scan speed than another but be so fast at recognition that it's faster
in real-world use. The only way to find that out is to test the
scanners. If you can't run a test yourself, look for the information
in our reviews.?

?If you want to scan and edit files, you'll need an OCR program or
module that can send files to your word processing program.?

   According to this site, scanning photos on a document scanner is
not recommended. Please read this web site for complete information.,1895,2006854,00.asp

   ?If you're using OCR or other recognition technologies and you're
dealing with highly variable or hard-to-read documents, look for
state-of-the-art image processing, such as Kofax VRS or Kodak's
similar Perfect Page and iThresholding combo. This will ensure
best-possible image quality and higher recognition accuracy. All the
better if the scanner can deliver 300 dpi images at rated or
near-rated speeds.?

    ?In these environments, image quality is often more important than
speed. In the past, if an end user needed to scan a document at 300
dpi or higher to increase OCR [optical character recognition] and ICR
[intelligent character recognition] accuracy, the scanner would slow
down to a crawl. However, now users can get a high-resolution image
without sacrificing speed.?

Features to look for in a scanner:

?Workgroup Scanners
Price Range?$500-$2,000 / Speed?10-25 ppm
Workgroup scanners are typically used by individuals or by small
groups of users where a single workstation is the scan station for the
rest of the group.?

Departmental Scanners
Price Range?$2,000-$5,000 / Speed?25-40 ppm
With their faster speeds, these scanners meet the needs of variable
low-volume, non-production applications. They are cost effective for
larger groups such as departments or small to medium-sized businesses.

Some scanners recommended by PC Magazine, a trusted resource

Fujitsu ScanSnap S500
Document and business-card scanner. Rated at 18 pages per minute, or
36 images per minute for scanning both sides.,1895,1990588,00.asp

?Fujitsu still focuses on scanning to PDF, with JPG as a second
choice, and it relies on Adobe Acrobat 7.0 to handle the scanned
files. But the ScanSnap S500 includes a version of Abbyy
FineReader?FineReader for ScanSnap 2.0?as an alternative to optical
character recognition. You also have the option to scan a document,
recognize the text, and send it to, say, Microsoft Word in one step.
But you still have to initiate the scan from within the ScanSnap

Rated as Very Good by PC Magazine, 
?The Canon DR-2580C is the fastest document scanner we've tested for
scanning and saving in searchable PDF format, but those who need
software for document management or indexing to help organize scanned
files must buy that separately.?,1895,1891755,00.asp

?The DR-2580C is not only one of the smallest scanners in its class at
a mere 4.2 lbs, but also one of the most easy-to-use. Simply assign
commonly used functions to its Scan-To Job buttons for one-touch
operation. Offering advanced scanning technology, the DR-2580C
acheives superior color quality and reproduction, as well as enhanced
results from low contrast documents.?

?In addition to a full-feature ISIS/TWAIN driver, the DR-2580C comes
bundled with CapturePerfect 3.0 and Adobe Acrobat 7.0 Standard -
giving you total control over scanning from start to finish.?

Full Review
?A document scanner's core task is turning large stacks of paper into
digital format in a hurry, a task that the DR-2580C excels at. Canon
claims that at resolutions of both 200 pixels per inch (ppi) and 300
ppi, the engine can process 25 pages per minute (ppm) in simplex mode
(scanning one side of the page) and 50 images per minute (ipm) in
duplex mode. Our test times when scanning to PDF image files were just
below those speeds, at 24.5 ppm and 49.1 ipm. Although that is not the
fastest we've seen, it's the fastest at this price or below, which is

Canon 2050 Document Capture Scanner 0433B002
Up to 20 pages per minute
?  OmniPage SE? Bundled with ScanSoft?s industry leading OCR software
that accurately converts scanned documents into editable text.
?  CapturePerfect 3.0 New Features ? 1) Zone OCR can be used as
indexed filename. 2) PDF Encryption and Security features. 3) Add,
Delete and Insert pages. 4) Adjust and save brightness/contrast
onscreen, after scan
?  Adobe® Acrobat® 7.0 Standard Full Version ? Includes full version
software application (not just the reader), a $299 US value. You can
create PDF documents in any application that allows printing. And much

Canon DR-2050C
?The DR-2050C incorporates Canon's renowned, high-precision roller
system that delivers smooth, jam-free feeding. Whether scanning single
sheet documents or multiple sheets of mixed document sizes and weight,
the DR-2050C offers uninterrupted performance with one of the most
reliable feeding systems in its category.?
?  High-speed scanning of up to 20 ppm in simplex or 40 ipm in duplex. 
?  Simple connectivity with USB 2.0 interface. 
?  Text Enhancement mode can overcome obstacles leading to illegible
image files such as color backgrounds, light colored lettering or
pencil writing.
?  Quickly and reliable scan mixed batch with efficiency-boosting
features like Skip Blank Page and Automatic Page detection
?  Up to 100 programmable user settings can be customized for improved
productivity of frequent scan operations

?When it comes to cost-effective, reliable document scanning, look no
further than the DR-2050C. The DR-2050C incorporates Canon's renowned,
high-precision roller system that delivers smooth, jam-free feeding.
Whether scanning single sheet documents or multiple sheets of mixed
document sizes and weight, the DR-2050C offers uninterrupted
performance with one of the most reliable feeding systems in its

ScanSnap Fujitsu FI-5110C Color Scanner 
15 pages per minute, 50 sheet feed

Other Recommended Scanners

Canon DR-2580C 
25 pages per minute, 50 sheet feed
Included software: Adobe Acrobat 7.0, Capture Perfect 3.0


HP Scanjet 7800 Document Sheetfeed Scanner
?Scan both sides of a page with one pass?at 50 ipm (25 ppm)?using
50-page automatic document feeder.
?	Save time on recurring projects with up to 30 customized scan
profiles?select profiles from display.
?	Easily scan different paper types, from business cards and plastic
IDs up to legal-size documents.
?	Scan and manage business cards using a unique card feeder and
NewSoft Presto! BizCard Reader.

Workgroup document management solution
?	Increase OCR accuracy?HP Scanjets and Kofax VirtualReScan work
together to optimize scan quality.
?	Easily convert scans into editable text using your HP Scanjet and
IRIS Readiris Pro OCR software.
?	Save and manage scans using preset one-touch profiles created with
HP Smart Document Scan Software.
?	Easily organize digital documents using your HP Scanjet and included
ScanSoft PaperPort software.

Complete specs found here:
?Spend less time fine-tuning. Get optimized scans the first
time, without manually adjusting color or contrast.
HP Scanjet Scanners and Kofax VirtualReScan work
together to make automatic adjustments for greater OCR
accuracy and sharper detail, such as logos and barcodes.
? Get editable text from hard copies. The included IRIS
Readiris? Pro OCR software converts scanned documents
to editable text with accuracy, using Microsoft Word or
Adobe PDF formats.
? Make copies with the touch of a button. A ?copy? button
delivers the convenience of a copier by sending a scan to
your default printer, providing multiple copies of a single
?Convert your hardcopies into a variety of popular file formats: including
Adobe® PDF, Microsoft® Word and Excel, Corel®
WordPerfect, TIFF, and JPEG. Electronic files can be easily
edited, e-mailed, delivered to local or network folders and
organized. A preview window lets you rearrange, delete
or add scanned pages. Use bar codes to separate jobs
and manage workflow.?

$651 and free shipping at
?  1200 dpi Optical Resolution and 48-bit Color Depth 
?  Automatically Scan on Both Sides of a Page in One Pass 
?  Scan a Variety of Paper Sizes up to Legal 8.5 x 14 Inches 
?  Convert Scanned Documents Into Editable Text with Included OCR Software

TigerDirect provides some clear illustrations and specs:


Canon 3080 Document Capture Scanner 9673A002
32 pages per minute, 100 sheet feed
?  100-sheet Automatic Document Feeder, ideal for continuous batch
scanning of mixed documents ? business card to legal size ? the Canon
DR-3080CII automatically adjusts for varying document sizes and
?  Built-in Skew Correction automatically straightens misaligned documents.


   This is a heavy duty, and fast scanner. It?s pricier than those
above, but still far less than $10,000!

Canon DR 7080C - Document scanner
?With the same impressive scanning speed of 70ppm for both color and
black and white documents (A4/landscape/200dpi), the dynamic DR-7080C
provides the perfect way to get through more work in far less time.

Duplex scanning in color is equally rapid, at 36ipm (images per
minute). Despite its high speed, the DR-7080C produces unsurpassed
quality scanning, giving you the very best of both worlds. By adapting
a 3-line CCD sensor, it provides continuous tone quality scanning.?

?Powerful software brings you the versatility, ease and efficiency you
need for extensive applications. The many benefits of CapturePerfect
2.0 include Scan to Mail, PC and Print as well as 90 Degree Auto
Rotation. While wide-ranging ISIS/TWAIN driver advantages include
MultiStream scanning and Book?

More specs on this scanner.

OCR Software

?Optical Character Recognition (OCR) is a process of scanning printed
pages as images on a flatbed scanner and then using OCR software to
recognize the letters as ASCII text. The OCR software has tools for
both acquiring the image from a scanner and recognizing the text.
Ideal Source Material for OCR

OCR works best with originals or very clear copies and mono-spaced
fonts like Courier. If you have choices, use the following source
?	12 point or greater font size. 
?	Black text on a white background. 
?	A clean copy; not a fuzzy multi-generation copy from a copy machine. 
?	Standard type font (Times, New Roman, etc.) Fancy fonts may not be recognized. 
?	Single column layout.

OCR Primer
Loads of tips for OCRing

?There are two basic methods used for OCR: Matrix matching and feature
extraction. Of the two ways to recognize characters, matrix matching
is the simpler and more common.
Matrix Matching compares what the OCR scanner sees as a character with
a library of character matrices or templates. When an image matches
one of these prescribed matrices of dots within a given level of
similarity, the computer labels that image as the corresponding ASCII

Feature Extraction is OCR without strict matching to prescribed
templates. Also known as Intelligent Character Recognition (ICR), or
Topological Feature Analysis, this method varies by how much "computer
intelligence" is applied by the manufacturer. The computer looks for
general features such as open areas, closed shapes, diagonal lines,
line intersections, etc. This method is much more versatile than
matrix matching. Matrix matching works best when the OCR encounters a
limited repertoire of type styles, with little or no variation within
each style. Where the characters are less predictable, feature, or
topographical analysis is superior.?

ReadIris Pro
?Readiris Pro lets you reproduce your documents in more than 20
different applications such as Word, Excel, Acrobat, Internet
Explorer, Netscape, WordPerfect, StarOffice and many others. The
output file retains perfectly the lay-out of the original document.?

?How does it work?
1.	Scan your documentSimply scan your paper text or open a PDF file or
an image document. Readiris Pro 11 opens the most common used images
and PDF files.
2.	Convert it into editable text
Once you opened your file into Readiris, just click on recognize and
save. Within seconds, your document is converted into digital files
you can edit, share and save! It?s fast and accurate.
3.	Export your file into your favourite application
Automatically send the recognized document into your favourite
application such as: Word, Excel, Acrobat (PDF), Internet Explorer
(HTML), WordML, SpeadsheetML or save it as an external file.

?There's a lot to like in PaperPort Pro 9 Office ($199.99 direct),
ScanSoft's latest iteration of its popular document management
utility. Perhaps the biggest draw is the ability to create, edit,
annotate, and even to search for text within PDF files, all from
PaperPort's convenient interface.

Designed to make scanned documents more manageable, PaperPort lets you
organize them into folders in its main window. You can stack
individual pages to create combined documents, then view them either
as thumbnails or in full-screen mode.
This version includes ScanSoft's invaluable Form Typer utility. Scan a
form, then drag and drop it on the Form Typer icon; Form Typer
automatically identifies the blanks on the form so that you can type
in your responses. In addition, you can import any file to PaperPort's
management system. If it's an image file, you can use the built-in
editing features to adjust color, contrast, and brightness and even
remove red eye.?,1759,1090218,00.asp

OmniPage 15
This page displays how the product works, and the file formats to
which the scanned document can be converted, including Word, Excel,
Power Point, PDF and 26 others.

?OmniPage 15, the latest version of the world's best selling OCR
software, is the most precise way to convert paper and PDF files into
your favorite PC applications quickly and cost-effectively. Powerful
new OCR technology, advanced layout analysis and intuitive editing
tools allow you to quickly turn paper and PDF files into more than 30
different editable electronic file formats that look just like the
original ? complete with text, tables and graphics. What's more,
improved speed and efficient workflow capabilities combine to make
document conversion faster and easier than you could imagine. Save
time and money like never before with the world?s most powerful
document conversion application.?

A scanning tutorial

Another tutorial

Corel Paint Shop Pro
For inexpensive image enhancement, consider Corel Paint Shop Pro
Corel?s site will allow you to download a free trial version.

    There you go! I didn?t know your budget, or exactly how fast you
need the job done. You may consider getting two lower priced scanners
and scan in tandem. Once the job is done, you could sell one on eBay,
if you no longer need it for future jobs. You?d need two PCs, as
scanning and OCRing uses a great deal of memory and processor power.

   Of all the above scanners, I believe, if it were me, I?d chose the
HP Scanjet 7800 Document Sheetfeed Scanner. It is reasonably priced,
and comes with Readiris Pro 11 for OCR ease. The software allows you
to decide to convert your scanned files into a PDF document, or
various others including graphics files and Word documents. You won?t
need to purchase any other software.

  If this answer is unclear, or is not the information you were
seeking, please request an Answer Clarification, and allow me to
respond, before you rate.

Sincerely, Crabcakes

Search Terms
Heavy duty document scanners
Canon DR 7080C - Document scanner + review
OCR software
Best document scanners + cnet
Best document scanners + PCMagazine
Document scanners recommendations
Leasing document scanners
Batch scanners
Subject: Re: Scanning 100.000 pages
From: kohinoor_dot_ca-ga on 10 Oct 2006 20:38 PDT
perhaps it would be cheaper, if:

you shipped all the books to India, China
had someone scan it for you
and then ship the books back?
Subject: Re: Scanning 100.000 pages
From: veconofix-ga on 06 Nov 2006 15:54 PST
I know of a company in Gainesville Florida who does this: they do
business as a record storage alternative.  They scan many file
cabinets full of records every day.

The images an be in any format you like: .jpeg, .tiff, .pdf,  whatever.

They are NOT OCR'ed, but are high photo quality color images of the original.

I'd post the info e-mail, but google answers won't let me post e-mail addresses.
Subject: Re: Scanning 100.000 pages
From: daorganizer-ga on 06 Nov 2006 18:44 PST
Have you tried Imaging Connections yet? Imaging Connections is an
online marketplace for document scanning services and scanning
systems. The service is free to use. All you do is complete a simple
form with your scanning requirements to receive up to 5 free scanning
quotes from multiple vendors. Check it out, heres the website

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  

Google Home - Answers FAQ - Terms of Service - Privacy Policy