Google Answers Logo
View Question
 
Q: Scanning a large photographic archive ( Answered 4 out of 5 stars,   6 Comments )
Question  
Subject: Scanning a large photographic archive
Category: Computers > Graphics
Asked by: axle-ga
List Price: $50.00
Posted: 29 Aug 2002 15:13 PDT
Expires: 28 Sep 2002 15:13 PDT
Question ID: 60022
I have a large collection of 35mm negatives, approx 250 rolls of 36
frames - say 10,000 images in total. I also have a Nikon CoolScan
4000dpi scanner which can bulk scan the frames on each strip of 5 or 6
negatives.
 
The scanner produces images 5959x3946 (23.5 megapixels) and can scan
at 24bpp or 42bpp. A 42bpp TIFF file is 141MB and is clearly not a
practical format of manipulation or bulk storage, it is also not yet
widely supported! The JPEG format is limited to 24bpp but is obviously
lossy.

A small collection of images chosen at random, scanned at 24bpp and
saved at Q95 produced files sizes of 6MB-8MB. This is approximately
halved for Q90. Even at Q95, 10,000 images will take up less then
100GB and confortably fit on a single IDE drive. In addition to the
6000x4000 images, I would also generate screen-sized previews (maybe
only 1024x768) at a low 'Q' for indexing and rapid browsing.

My primary reason for this project is to backup a huge repository of
[image] data that is currently stored in a cardboard box in the attic
and would be irreplaceable if lost or damaged.

Once scanned, the possibly longer task is to categorize and index all
the pictures - date, time, location, subject matter, description,
caption, etc...

My requirements (in rough order of importance):

 * Preserve the images at as a good a quality as possible
 * Produce an archive of a manageable size
 * Be able to reproduce 10"x8" prints (given a suitable printer :)
 * Use standards that provide maximum future-proofing
 * Once the images are in the digital domain minimize the chance of
 having to re-scan them at a later date.
 
 * I am _not_ concerned about the possible time it is going to take to
physically scan the images - maybe 6 months!

My questions:

1) What format I should choose for the images?
2) If this is JPG, is the extra file size of Q95 worth it?
3) How should the additional comments/tags for each image be stored?
Answer  
Subject: Re: Scanning a large photographic archive
Answered By: answerguru-ga on 29 Aug 2002 16:25 PDT
Rated:4 out of 5 stars
 
Hi axle-ga,

You pose a very interesting question which can become overwhelming if
not approached cautiously :)

Let's consider each component one at a time:

HIGH QUALITY ARCHIVING OF IMAGES

While this may sounds very good, it is easy to get carried away with
having the "best possible" image quality retained. To give you a
parallel example, the MP3 phenomenon is based on a filetype that can
drastically reduce the size of a piece of audio, and yet the sounds is
virtually the same as the equivalent WAV file to the human ear.
Similarly, images can grow exponentially in size, yet the incremental
increase in quality quick becomes very small. I haven't forgotten that
you are dealing with negatives...but this matter is still prevalent :)

You mentioned that you want to digitize these images for backup
purposes and that you want to be able to print the images out at 10" x
8". Since you are not going to be doing any intensive photo editing,
the objective will be to maximize quality at that size for printing
with a suitable printer.

The best solution for these requirements would be using the Windows
Bitmap File Format (extension BMP). There are several reasons for
this, of which some are:
1. Up to 24-bit color (2^24 = 16.7 Million colors)
2. Effective compression without compromising quality (unlike TIFF)
3. Device independant
4. Easily converted to other formats if necessary (using software such
as Adobe Photoshop -- this is a large factor in your "future-proof"
and "avoid scanning the images again in the future" requirements)

Further technical specifications of the file format can be found here:

http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/BMP.txt

As far as the size is concerned, here is a process that is always
personally helpful when determing the right DPI level for an archive:
1. Scan at half of the DPI level that your scanner is capable. In your
case 2000 DPI
2. Increase by 50% (to 3000, then 3500, etc.) until you can no longer
see a significant quality difference in the corresponding 10x8 image
that results. It is difficult to nail this down since it can vary by
the type/brand of negative being considered.
3. Continue using this level of quality with the rest of the images to
maintain consistentcy.

LOW QUALITY IMAGES FOR INDEXING AND BROWSING

We will also need a lower quality format to use for indexing and rapid
browsing purposes. For this, the JPEG file format is sufficient since
you will not be printing any of these picture. I have found from
experience that 1024 x 768 is fine for scanning negatives (since you
don't want to make it larger than the resolution of your monitor). If
you have a lower resolution monitor you may even want to go lower than
that. Since the JPEG standard is somewhat economical in how it encodes
images, it really is not worth using a Q95 level since, in my opinion
you will be sacrificing quite a bit of space for almost unnoticeable
quality increase. If you are having doubts about the quality level
here, follow the process outlined above to obtain the right DPI level.
Also keep in mind that you may want to build on your archive in the
future, so its always a good idea to keep a little cushion for that as
well :)

Technical specifications of the JPEG image format can be found here:
http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/JPEG.txt


STORING/INDEXING IMAGES
The ideal way of actually indexing all these images would be to set up
a database system (the pure amount of information here warrants this
step). This will allow you to organize the images as records such as
this one:

                               Images Table
Image ID   |  Small Image (hyperlink) | Large Image (hyperlink)  |
Comments

// each row represents an image

A program such as MS Access stores information in this manner, but
again, the size and number of entries here may slow down the speed at
which you able to move through the information. You can perhaps split
this into several database such that the size of any one file doesn't
get too big and this way you could even sort the images by category if
you like. In any case, you will be able to store any additional
comments or tags you desire by simply adding another column to your
table. This will fulfill your future need for adding additional
attributes to each image, such as "date, time, location, subject
matter, description, caption, etc..." that you mentioned in your
question.

If you want a full-scale Database Management System (DBMS), there are
several solutions such as the Microsoft SQL Server, Oracle, IBM's DB2,
and so forth that you should look into. Personally I find the SQL
Server powerful while having a quicker learning curve compared to the
others.

I know this will be a very time-consuming project, but you seem up to
the task! One final suggestion is that you name all of your images
logically so that they can be found/grouped easily in the future :)

Anyhow, it was a pleasure answering your question since I know how
many people struggle with this kind of project...if you have any
problems understanding the information above please post a
clarification.

Cheers and happy archiving!

answerguru-ga

Request for Answer Clarification by axle-ga on 02 Sep 2002 05:17 PDT
Dear answerguru-ga 

Thank you for your response, but I must admit I am surprised at your
answer for a number of reasons:

1) You describe the analogy with mp3 (which is a good one) and one I am
well acquainted with as my last 6 month project was to archive my
1500+ CDs to mp3. So we both agree that it is not necessarily vital to
preserve every last bit of information, but you then recommend the BMP
format which is lossless.

2) You say:
> 2. Effective compression without compromising quality (unlike TIFF)
But as as far as I know, only the 1, 2, 4 and 8bpp BMPs can be
compressed. The truecolour 24 bits cannot.

 from http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/BMP.txt 

  Windows versions 3.0 and later support run-length encoded (RLE) formats for
  compressing bitmaps that use 4 bits per pixel and 8 bits per pixel.

3) For my criteria of a 'manageable' archive, I would hope to fit my
10,000 images onto a single IDE drives. With current technology, this
gives me 160GB or 16MB per image. 42bpp scans are 140MB, 24bpp are
70MB.  If I go for 24bpp, I need ~5:1 compression.

If I go for a lossless compression, my choice would appear to be
something like 1.5 for PNG or 2.5-3 for jpeg2000 or bitjazz (neither
of which I really think are widely enough supported)

(in a quick test gzip'ing a 24bpp image reduces to 82%, bz2'ing to 69%)

4) Having read many reviews of negative scanners, the reason I bought
the Nikon was because of it's 4000dpi capability, in many tests it was
proved that 4000dpi produced a marked improvement over the older
~2500dpi scanners. I'm not going to scan at a lower resolution than my
equipment allows, although I am fairly sure that the extra bits of
42bpp is spurious accuracy and is not necessary.

5) I am based almost entirely in the Linux world and would prefer to
to avoid proprietry standards (such as BMP).

Clarification of Answer by answerguru-ga on 02 Sep 2002 07:59 PDT
Hi axle-ga,

Let me address each of your concerns one at a time - I'll use the
numbering you've used in your clarification:

1. I agree with you that the BMP format is lossless, and this is part
of the reason that I chose it. It is important to understand that
although far more information is retained, the algorithm used to store
the data representing the image is extremely effective. So you get the
best of both worlds - "lossless" quality that is stored very
efficiently (relative to the other heavy duty formats such as GIF).

2. We are both thinking of "compression" here in two different ways,
and I realize how easily this misperception can be made. The
compression I'm talking about, again, is the embedded into the
algorithm that is used to create BMP files. This is the same for all
levels of quality in BMP files - it is the reason that they are all
classified as BMP files!

3. I would have to strongly advise against using either of PNG or
JPEG2000 in your scenario, but your test that revealed that a BMP was
reduced to 82% of its original size is an indicator that its original
compression algorithm is quite efficient (believe it or not) and of
course the fact that it is lossless. You will see that if you try the
same thing with other lossless formats, the BMP will originally be
smaller, but the others will zip more effectively. The sizes of the
final zipped files will be relatively close because they have all
undergone the same level of compression. Personally, I'd rather the
bulk of this take place when the image is created, thus reducing risk
of future problems if the images need to be brought out of zipped
format. To be honest, I don't think you'll get that number of pictures
into 160GB unless you lower your resolution somewhat (which of course
is not reasonable and I don't suggest you do so). My suggestion: you
will take a fair bit of time to fill up 160GB, so just go ahead and do
that, and in a few months you can add a second IDE drive to your
system. You would just be giving up too much quality (zipped or not)
by choosing a lossless format in order to conform to 160GB.

4. I didn't realize that was an issue for your particular scanner...is
there a problem when you try to scan in this format at 4000dpi? If
not, I say just go ahead scanning at that level...

5. Your environment was not made clear in the question...regardless I
know that BMP is fully supported by most Linux-based graphics
programs. I'm not quite sure what you are alluding to here..

I hope this has given you some insight into why I chose the BMP
format. I think we can close the first three questions, since there's
not much more I can say about that, but if you would like to continue
with 4 and 5 I would be happy to oblige :)

Thanks,

answerguru-ga

Request for Answer Clarification by axle-ga on 06 Sep 2002 17:46 PDT
Dear AG,

I still don't agree with your reasoning for choosing BMPs (over other
lossless formats): 24 bit BMPs are *not* compressed and are merely
stored as a [small] header followed by the pixel data in RGB triples -
there is no 'algorithm' for storing them. The fact that I can gzip the
files at all implies that they are not compressed!

I also don't understand why you started by confirming that lossy is
not a bad thing, but then recommended a lossless format.

I think it is important to know the answer to the following question:

  Given the likely uses for archived photographs, is it possible to
  distinguish between a Q90+ JPEG and the original uncompressed image?

If to all intents and purpose it is /not/ possible to distinguish then
I can see no reason to avoid JPG. If it /is/ possible then a lossless
image (of whatever type) is the obvious answer. By this I mean not
just me looking at a couple of images on my laptop - I wan't to know
if some hypothetical picture editor of The Sunday Glossy would bemoan
the receipt of anything about a pristine scan.

If lossless is the answer, and the quality of the scanned images is
paramount (and more important than the [large!] saving on storage
space) then it would seem foolish to limit the images to 'only' 24bit
- the Nikon is quite capable as a 42bpp scanner. Extending the file in
this way would appear to limit the lossless format choice to
TIFF.

What would be ideal would be a compromise - a 32bpp format with
nominally 10 bits per colour, possibly with a heavier weighting to
green then red then blue.

As you rightly say, 160GB is going to take a while to fill, although
at 140MB an image I may only get a thousand images per drive. I do not
require the final images to be online at all times, so extra drives
could be swapped in and out of the system as necessary.

To address Gareth's point about DVDs - Even a ~10GB DVD would only be
able to store about 70 140MB images - barely two rolls of film! I
would require maybe 150 discs to store the archive. As a backup of the
HDs this is not a bad idea, although I wouldn't fancy duplicating a
set :)

Most of the time I will not need access to these high-resolution
versions, and the screen-sized previews can be used, at maybe only
~150K per image, I could store the entire browseable archive in less
than a couple of gigabytes on my everyday laptop...

To summarize:

* If one day I may regret a lossy format, then lossless it is.

* If this is the case, then I might as well go the whole way and scan
at 42bpp to preserve as much of the original detail of the negatives
as possible.

* If 42bpp is used then it looks as if my choice is made for me - TIFF.

Clarification of Answer by answerguru-ga on 06 Sep 2002 18:14 PDT
Greetings again axle-ga,

To avoid another long-winded answer, I'm going to responds to your
summarizing questions:

1. I feel that you will come to regret choosing a lossy format. Put
simply, if I can generalize this class of format, I would say it's
ultimate purpose cannot extend beyond screen viewing (be it remotely
or locally) without noticeable quality loss.

2. I was attempting to provide a middle of the road solution by
choosing a 24-bit lossless format, however, I completely understand
where you are coming from when you state that you want to take full
advantage of your scanner and retain as much quality as possible. This
being the case (that I admit I did not consider initially), I do
believe that TIFF is the way to go.

Your decision appears to be made :)

Thanks for using Google Answers.

answerguru-ga
axle-ga rated this answer:4 out of 5 stars
After an initial disagreement concerning the capabilities of BMP
files, I am now satisfied with the guidance provided - JPGs should
only be used for browsing purposes, for any other purpose a lossless
format should be chosen. Thank you Google.

Comments  
Subject: Re: Scanning a large photographic archive
From: gattrill-ga on 06 Sep 2002 12:36 PDT
 
axle-ga,

Have you considered storing the full-res images on DVD? You could
record several copies of each disc and store with a relative for
backup purposes and in case your copy becomes damaged. You could store
a low-res version locally for preview purposes and fetch the relevant
DVD if it is required. I don't know a huge amount about DVD recordable
formats but I believe up to 9.6Gb can be stored on DVD-RAM. This might
solve your hard drive problems.

Gareth
Subject: Re: Scanning a large photographic archive
From: brian_shih-ga on 25 Sep 2002 14:06 PDT
 
JPEG2000 is better choice to store higher quality
than JPEG
Subject: Re: Scanning a large photographic archive
From: collin-ga on 25 Sep 2002 15:08 PDT
 
I'm surprised at the recommendation of BMP.  BMP is pretty much a
windows only standard and you will have difficulty transfering and
viewing them accurately on other platforms.  TIFF or PNG would be a
better choice.  While TIFF has its own set of compatibility issues,
these stem from the extensibility of the format to 16-bit per channel
images and other high-end features, not Microsoft's proprietary
strategies. No graphics or photo professional uses BMP for critical
archiving.  Using a standard, widely-supported format increases the
chances of the archive being usable long into the future.
Subject: Re: Scanning a large photographic archive
From: axle-ga on 25 Sep 2002 16:28 PDT
 
Dear Collin,

Quite - I didn't understand the emphasis placed on BMP either! My main
decision has been made though: lossless rather than lossy. Because I'm
going to be scanning at 42bpp, TIFF is the only real candidate. If I
scan at the highest resolution, most number of bits per pixel and save
the results lossless I've done all I can. If some amazing new
compression system appears a couple of years down the line I will
still have my negatives in their best possible state and can make use
of the new technology.
Subject: Re: Scanning a large photographic archive
From: rustface-ga on 30 Oct 2002 11:06 PST
 
Late comments...

Seems like Kodak PhotoCD Pro might be viable. They may even use
roll-fed scanners! Cheap, fast, great resolution, browsable, very
printable.

In the final analysis, the real answer will probably be boiled down to
economy ($ & time, despite time supposedly not being a factor) and
quality being judged as "good" by the average "man in the street" when
the image is printed out at a normal size. If you're doing your own
scanning, this suggests to me 8 bits per pixel and "high" quality
JPEGs, with an uncompressed file size of about 30 megs.

But who knows, maybe these are scientific images that require 10 bits
and extremely high res and a lossless file format.

What may be more important than any of this is the keywording and
database structure behind it all...

I would love to hear back on this project a year from now! Cheers!

PS Check out Photoshop 6/7 web gallery feature.
Subject: Re: Scanning a large photographic archive
From: kman-ga on 04 Nov 2002 04:36 PST
 
I deal with thousands of images and videos as part of my business:
http://www.nerdmaker.com/
I am happy to see you reached the best conclusion. When working with
video, audio, or images, always(almost) create the highest quality
original, then make a modified copy for working purposes(modification,
distribution,...). Since your limiting factor is the scanner, that
determines your original image quality. There is probably a method for
preparing photographic surfaces for scanning which will also improve
your results.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy